[daip] data area optimization

Patrick P Murphy pmurphy at nrao.edu
Wed Dec 21 09:29:42 EST 2005


On Tue, 20 Dec 2005 09:17:23 -0700, Eric Greisen <egreisen at nrao.edu> said:

> Perhaps Pat or others might also chime in on this reply.

Sure.  

On Tue, 20 Dec 2005 12:38:07 +0100, "M.A.W.   Verheijen"
   <verheyen at astro.rug.nl> said: 

> I would like to ask your advice regarding the optimization of multiple
> data areas (2TB in total) which will be simultaneously accessed by AIPS. 

> First a little bit of background.  I have initiated a Large Programme at
> the Westerbork telescpe to observe HI in distant galaxy clusters.  For
> this, I will fully exploit the new WSRT backend which provides 8 IFs,
> 256 channels per IF, and 2 polarizations.  Total integration time will
> be ~200x12 hours, spread over the coming 3 years.  To accommodate these
> data, I am about to order a dedicated multi-CPU data-server with 2 TB of
> disk space.  Data processing will probably be limited by data I/O, and
> not so much by CPU power. I imagine running multiple IMAGR tasks or
> SELFCAL scripts at the same time, accessing multiple AIPS data areas.

> I seem to remember that, in AIPS, data I/O benefits from having multiple
> physical drives, e.g.  2x250 GB is better than 1x500 GB, given the fact
> that AIPS uses temporary scratch files.  

There may be some benefit to this, but in order to avail of it, the
drives to have to be distinct (i.e., not just separate partitions carved
off a single large RAID array or disk).  There's also the issue of type
of controller: SCSI or SATA.  While the latter is rapidly catching up,
the former is still the king of the performance hill, *especially* when
it comes to multiple processes accessing the same physical disk.  How
important this is will depend on usage patterns (e.g. if you plan on
having others do things on the computer when you're running, say, a
monster IMAGR that takes hours or days, you will likely want SCSI).
While it's true that multiple processes accessing data on the same
physical disk will slow things down, my experience is that SCSI copes
with this better than SATA (at least the SATA systems we've tried in the
past year or two).

We have tried/used both SATA and SCSI systems at NRAO, and I'm still
inclined to lean somewhat towards the latter where performance is
important.  Clearly SATA has a big price advantage, but we have had a
few issues with it -- some of which may be more due to the manufacturer
than the technology -- and so I'm still hesitant to go the SATA route
unless enormous capacity at minimal cost is a top priority.

> The nature of the data (8 IFs) suggests that is would be best to store
> the data from each IF on its own physical hard disk.  Therefore, I
> thought that having 8 hard disks of 250GB each would be preferred.
> These disks would then be RIADed or (software rsync) mirrored to 8
> other identical disks.

You'd need a hardware RAID controller with at least 8 channels for that.
This could be problematic; I've only ever had two separate channels in
the servers I've configured (usually one channel with 2 disks for the
system, and another with as many as can fit for the data).  But I could
be wrong.

> Before investing my money, I would like to ask your advice how best to
> configure the 2TB of disk space in order to optimize the data I/O for
> AIPS processing, and what kind of hardware this would require.  Would it
> be useful to assign part of memory to a (virtual) partition for scratch
> files?

I would probably be inclined to use RAID-1 for two smallish (18 or 36GB)
system disks, and then RAID-5 for one or two partitions for the data.
I'd also keep a (possibly hot) spare on hand for the data array(s).

> Would it be advantageous to distribute AIPS data areas or partitions
> over multiple hard drives? What sort of RAID system would you
> recommend?

Eric addressed the first question already.  We have had good experience
with Dell systems and PERC Raid arrays; that's what is in use on our
"AIPS Caige" systems here in CV.  We can't clearly endorse any vendor,
but we can say what we have and what's worked well for us.

Hope this helps.

 - Pat

-- 
 Patrick P. Murphy, Ph.D.              Head, NRAO WebAdmin Working Group
 NRAO Computing Security Manager        Division Head, NRAO/CV Computing
 Home: http://goof.com/~pmurphy/     Work: http://www.nrao.edu/~pmurphy/
 "Inventions then cannot, in nature, be a subject of property."
                                    -- Thomas Jefferson, August 13, 1813




More information about the Daip mailing list