[daip] problem

Wed Sep 24 10:51:16 EDT 2008

Tony Beasley wrote:
> 
> hi Eric, have a question for you, sure you know the answer..
> 
> How many times is a visibility accessed during the course of a normal 
> reduction? Assumption: this is a set of "calibrated visibilities", i.e. 
> a setjy/calib/getjy cycle was run to produce a set of assoc. cal tables.
> 
> So data with 1st-estimate amp+phase cal tables... sitting on disk...
> running through cycles of imaging, self-calibration etc. How many times 
> is the entire dataset "read" from disk, in what steps (gridding, 
> imaging, cleaning, etc)?
> 
> Reason I ask... I keep hearing that gridding and reading the datasets 
> will be the real limitation for the low-frequency datasets in future, 
> but has to be done because nothing faster than FFTs. Want to really 
> understand the problem. Is gridding a serious limitation now? Is it the 
> tile/facet requirement at low freq that generates the problem?
> 
> And let me turn the issue around another way... if I said to you - 
> starting from scratch, could a reduction system be written to enable one 
> pass (ONLY) through the data set to get the same quality result? What 
> would be required - complete load of data to memory? Ultimately, 
> clean/self-cal are iterative, maybe multiple reads required. Your thoughts?

You don't want to know much do you!

I don't know about CASA so this will solely be from my AIPS experience. 
  The gridding process requires a read through the data - up until today 
once for each facet.  Now AIPS uses more memory and grids as many facets 
as possible in one go.  This leaves a significant cpu usage but saves on 
IO a lot.  Whether that matters depends on the size of scratch memory 
and the size of the data set.  If the CPU can devote more scratch to the 
UV data set than it requires, then things are fast.  If it cannot, then 
things are abruptly much slower - hence the change in AIPS code which is 
of little use when the data fit in scratch ram.  I was always under the 
impression that modeling - i.e. Clean component subtraction/division was 
the bottleneck so I changed the AIPS code some months ago to use a lot 
of memory there too so that multiple facets and or multiple channels 
(when the model is allowed to be channel dependent due to primary beam, 
or known spectral indexes) can be done at once to reduce the IO count.

The reads for regular calibration are trivial and self-cal is part of 
the CC modeling.

One of the worst cases I've seen was spectral line in which IMAGR read 
the 127-channel data set to make a copy of the one desired channel at 
this point in the loop and then imaged that channel.  With a 7 Gbyte 
data set that read took many times longer than the full processing of 
the channel afterwards.  When the problem was moved to an empty machine 
with 16 Gbytes ram, AIPS did the whole dirty cube in about 20 minutes 
instead of several hours.

So there is no good answer to your question and no magic bullets.

Eric