[daip] data area optimization

Eric Greisen egreisen at nrao.edu
Tue Dec 20 11:17:23 EST 2005


I wish I knew how best to advise you.  The slowest part of a Clean is
the gridded subtraction.  In that operation a disk file containing the
residual data is read and re-written while a small disk file
containing the grid is first fully written and then read.  If you have
a lot of spare memory, that smaller file may actually reside in memory
and never spend any real time on disk.  The master files (input uv
output image) and this work file could reside on the same disk with
little loss since the master files are accessed only at the beginning
and end.  Splitting the IFs makes a lot of sense since you will be
calibrating and imaging them separately.  It would not be good to have
two programs in the same or different computers try to use the same
disk at the same time.  This would lead to head contention which will,
in a very non-linear way, slow things down.

It is also important to have the disks be local to the computer that
is using them.  In this way, Linux may choose to actually keep the
disk up to date only sporadically at great improvement in performance.
If NFS or some other I/O system gets involved, then the operating
system will actually insist on doing what the software tells it.  I
have seen NFS cost us 1 second per message in DDT (or > 2000 seconds)
when Unix systems can do the full thing in very much less time with
local disks.

Before deciding to shadow your disk files, you might want to do some
timing tests.  It is important to shadow the master files on some time
scale - but UVCOP, SUBIM, and TACOP could do that in your scripts.  I
have seen shadowing I/O systems slow things down seriously and you
really do not care if the IMAGR residual work file and the gridded
subtraction files are lost occasionally.  They contain nothing of
long-term value.

That's about all I can think of at the moment.  Memory for I/O
buffering and local disks are the most important.  You might want to
consider the "pseudo-AP" size too.  If you are doing very large images
a larger AP might improve the FFTs but if the images are individually
smaller, then the large AP costs in memory management.  Parameters
such as MAXPIXEL are quite significant in performance as well.

Note that the binary installation runs about 30% faster on PIVs than
the GNU-compiler version.

Perhaps Pat or others might also chime in on this reply.

Eric Greisen




More information about the Daip mailing list