[evlatests] Monitor Database

Barry Clark bclark at aoc.nrao.edu
Wed Sep 7 19:12:30 EDT 2005


>From Pat Vanbuskirk:

> We're trying to sort out how long we should keep the monitor data online 
> in the database. We're still growing by about 4.5G per week and the rate 
> will increase as we add antennas and monitor points.

During the week of August 31 to September 6, incl, the mpts tables total
29,896,499 rows.  The database pruner catchup pruning jobs that week
pruned tables from 041217 to 050112, deleting about 14,029,287 rows.
So we should have had a net increase of about 16 M rows.  I don't 
understand how Oracle stores its data, but the working part of a mpts 
table looks like about 100 bytes (three varchars maxing at 70 bytes, 
plus three numbers).  So if we're growing at 4.5 G per week,  we must
be having about a factor of three in overhead (unoccupied leaves of a 
btree, indices, backups etc.)   What sort of backups are we keeping?
Where does all that disk go?

I've been running the pruner catchup very slowly because even in January
there is the rare big day, and it gets very confusing if the catchup jobs 
start stumbling all over each other.  But before we start talking about
putting data offline, we should accelerate the pruner.  If things are
desperate, I can have the catchup jobs scheduled the clock around when
I'm gone a few days this month.

As I recall, the database rate started growing a lot about May.  Left to
its current rate, the pruner will get there in about a month, and then
the disk size should start to stabilize.  But 18 GB between now and then
does sound like a lot.

There's also the question of how fast the filler can run.  It is known
to lose data when a big datasourse is in Socorro.  (The mechanism is that
the routers buffer stuff, so that when it arrives, it arrives in big
chunks, which may overflow the filler's network buffers.)  This suggests
that the current 30 M rows per week = 50 transactions/second may be getting
to within a factor of a few of what the filler can handle.  We will 
probably have to fairly seriously throttle the data from the sources
to keep the datarate down to not much more than the present.

The pruner is currently cutting the sizes of tables by a factor of about
10.  No guesses how that might go with more modern data.  I'm thinking
in terms of an even more severe pruning after a year of age, and would
appreaciate suggestions for things that might have a very long term value.
I believe the goal to keep 30 days of unpruned data on-line is reasonable.
With Pat's rate of 300 bytes per net row, that's 50 G in 30 days of 
unpruned tables, another 50 G in a year of pruned tables, and a smaller
amount in older stuff.  Sounds not too bad in the long run, even if we
find the database filler can support another factor of two or three.



More information about the evlatests mailing list