[fitsbits] start of Public Comment Period on compressed FITS image and tables
Douglas Tody
dtody at nrao.edu
Thu Jul 23 20:08:40 EDT 2015
I am a bit late reviewing these discussions, but just want to note that
if we are serious about use of MEF to store big data, there really does
need to be an index for the MEF. This issue is related to INHERIT,
since global structure and info is required (the index is also global),
and to a lesser extent to the issue of padding the HDU with blank
keywords to avoid having to rebuild the entire large mega- or giga-file
file when adding a single keyword to a HDU (whether or not this an issue
depends upon the implementation, but it is almost always wise).
A useful analogous format is ZIP or JAR, which is a multi-file storage
format what has such an index and can also be serialized as a single
file or container (probably the same could be said about HDF5 although
that is far more complex). In an efficient implementation the container
file can be accessed at runtime unpacked as a directory tree, but is
easily serialized as a single container file, and later unpacked for RW
access. RO access could be either directly to the indexed MEF, or more
simply to the equivalent unpacked directory tree. The FOREIGN
convention BTW, also addresses the directory tree structure, although an
index could be a better solution. The current FITS MEF is trying to do
this sort of thing, but is not quite there.
I agree though, that this is premature to be addressed now, although it
is relevant to consider when reviewing existing conventions. I just
want to note that it would not be difficult to add, and would make FITS
MEF format far more useful for complex datasets. BINTABLE has its uses
as well as a container, e.g., for generic conventional tables certainly,
but also for tables that store spectra or smaller images like cutouts
wherein the data segment can be represented as a array , but it is not
ideal for large datasets containing arbitrary types of objects, each of
which can be very large. A MEF or other serialization with an index
works well as a generic container.
- Doug
On Fri, 26 Jun 2015, Lucio Chiappetti wrote:
> On Fri, 26 Jun 2015, van Nieuwenhoven, Richard wrote:
>
>> A minor concern is that it would help if there was some kind of index or
>> other way to have jump points to the separate hdu's.
>
> Not sure if hierarchical grouping does that (to be examined in the convention
> working team). While I proposed an INDEX extension (could just be a normal
> BINTABLE with a reserved EXTNAMe and layout) in the context of the other
> technical team (new features for FITS).
>
> As I said many times, I am not a keen supported of large MEFs with
> arbitrarily long number of HDUs, except for particular purposes (I call it a
> FAR, FITS ARchive, like a tar). And a FAR would benefit of an INDEX.
>
> But this is premature to be discussed now and here.
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>
More information about the fitsbits
mailing list