[fitsbits] start of Public Comment Period on compressed FITS image and tables

William Pence William.Pence at nasa.gov
Fri Jul 24 10:50:47 EDT 2015


Regarding an index for a MEF (Multi Extension FITS file), I would think 
that it could be constructed on the fly very efficiently if it didn't 
already exist.  Presumably all the information needed to construct the 
index is available as keywords in each of the extensions.  One can move 
sequentially through all the extensions in a FITS file very quickly. 
CFITSIO, if I remember correctly, can scan through more than 50000 
extensions/HDUs per second on a modest linux machine.  During that 
process, CFITSIO builds a simple index in memory so that it can move 
back directly to any previously read extension if desired.

-Bill


On 7/23/2015 8:08 PM, Douglas Tody wrote:
> I am a bit late reviewing these discussions, but just want to note that
> if we are serious about use of MEF to store big data, there really does
> need to be an index for the MEF.  This issue is related to INHERIT,
> since global structure and info is required (the index is also global),
> and to a lesser extent to the issue of padding the HDU with blank
> keywords to avoid having to rebuild the entire large mega- or giga-file
> file when adding a single keyword to a HDU (whether or not this an issue
> depends upon the implementation, but it is almost always wise).
>
> A useful analogous format is ZIP or JAR, which is a multi-file storage
> format what has such an index and can also be serialized as a single
> file or container (probably the same could be said about HDF5 although
> that is far more complex).  In an efficient implementation the container
> file can be accessed at runtime unpacked as a directory tree, but is
> easily serialized as a single container file, and later unpacked for RW
> access.  RO access could be either directly to the indexed MEF, or more
> simply to the equivalent unpacked directory tree.  The FOREIGN
> convention BTW, also addresses the directory tree structure, although an
> index could be a better solution.  The current FITS MEF is trying to do
> this sort of thing, but is not quite there.
>
> I agree though, that this is premature to be addressed now, although it
> is relevant to consider when reviewing existing conventions.  I just
> want to note that it would not be difficult to add, and would make FITS
> MEF format far more useful for complex datasets.  BINTABLE has its uses
> as well as a container, e.g., for generic conventional tables certainly,
> but also for tables that store spectra or smaller images like cutouts
> wherein the data segment can be represented as a array , but it is not
> ideal for large datasets containing arbitrary types of objects, each of
> which can be very large.  A MEF or other serialization with an index
> works well as a generic container.
>
>      - Doug
>
>
>
> On Fri, 26 Jun 2015, Lucio Chiappetti wrote:
>
>> On Fri, 26 Jun 2015, van Nieuwenhoven, Richard wrote:
>>
>>> A minor concern is that it would help if there was some kind of index
>>> or other way to have jump points to the separate hdu's.
>>
>> Not sure if hierarchical grouping does that (to be examined in the
>> convention working team).  While I proposed an INDEX extension (could
>> just be a normal BINTABLE with a reserved EXTNAMe and layout) in the
>> context of the other technical team (new features for FITS).
>>
>> As I said many times, I am not a keen supported of large MEFs with
>> arbitrarily long number of HDUs, except for particular purposes (I
>> call it a FAR, FITS ARchive, like a tar). And a FAR would benefit of
>> an INDEX.
>>
>> But this is premature to be discussed now and here.
>>
>> _______________________________________________
>> fitsbits mailing list
>> fitsbits at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>>
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits


-- 
____________________________________________________________________
Dr. William Pence    Astrophysicist     William.Pence at nasa.gov
NASA/GSFC Code 662     [Emeritus]       +1-301-286-4599 (voice)
Greenbelt MD 20771                      +1-301-286-1684 (fax)



More information about the fitsbits mailing list