[fitsbits] XTENSION = 'FITS' proposal

Fri Apr 12 15:51:01 EDT 2002

"William Pence" <pence at tetra.gsfc.nasa.gov> wrote in message
news:3CB70646.5795D6A8 at tetra.gsfc.nasa.gov...> Here's my $0.02 worth of
comments on Perry's XTENSION = 'FITS' proposal:
>
> This proposal offers a way to group a sequential set of HDUs in a FITS
> file.  The HDU with XTENSION = 'FITS' basically serves as a marker to say
> that all the HDUs contained in following NAXIS1 bytes should be treated as
a
> group.  However, having to update the NAXIS1 keyword value (and the
proposed
> 'XINDn' index keywords) whenever any of the HDUs within that group are
> modified, or when HDUs are added or deleted in the group, is a big
> disadvantage.  An alternate way to sequentially group HDUs would be to
> define a 'begin group' and 'end group' HDU that would delimit all the HDUs
> in that group.  These group marker HDUs could simply be dataless IMAGE
> extensions (with NAXIS = 0) that have EXTNAME = GROUP_BEGIN and EXTNAME =
> GROUP_END, respectively.  As with Perry's proposal, one could have groups
> imbedded within groups.  Using these group marker HDUs would eliminate the
> need to update any pointers that define the size of the group, and make it
> easy to insert or delete HDUs within a group.
>
To make some clarifying comments: the XINDn keywords would be optional
(or not even accepted into the proposal, they are not essential to its
nature).
There presence gives a capability that no other proposal gives, namely the
ability to locate an extension within a file without checking all
intermediate
headers. That certainly comes at a price of having to update them should
the file be changed in the size or number of any of its constituent
elements.

To address the other issue of having to update NAXIS1, certainly one extra
thing one must handle (and not easily handled for sequential devices like
tapes if you don't know how many or how big the elements in a "group"
are when you start writing it...but this is a problem for some data sets
with
NAXIS1 as previously argued about). When creating data on disk though,
I do not see it as a big problem. It is neither expensive or difficult to
update an existing keyword in an existing header after the fact if the
data are being written to the file sequentially (before knowing how many
or how big these elements are before 'closing' the enclosing FITS
extension).
Again it is a case of good aspects and bad aspects. The fact that NAXIS1
applys to the whole group means it is easy to skip groups when reading. This
not true
for the GROUP_BEGIN/GROUP_END proposal. One person's big disadvantage
is sometimes someone else's big advantage.

But I consider the GROUP_BEGIN/GROUP_END proposal to have a
a larger disadvantage. Header keywords in the XTENSION='FITS'
proposal have a very clear meaning for the group. They are parameters
that apply to the whole group. There is no equivalent mechanism (without
adding more machinery either with special extensions or header keywords)
for the alternative proposed.

The cost that FITS imposes for insertion or deletion (rewriting all the
following data) of extensions far, far outweighs any disadvantage of having
to update header parameters to account for insertions or deletions. I'm
not sure I would use the word 'easy' to describe FITS insertions or
deletions. But perhaps I have a distorted sense of perspective :-)

> One big advantage to using these GROUP_BEGIN and GROUP_END marker HDUs
> instead of a new XTENSION = 'FITS' HDU is that existing FITS software
would
> still be able to access any of the HDUs within the file, even if it knows
> nothing about the grouping convention.
>
Can't argue against that.

> I'm not convinced, however, that either of these ways to sequentially
group
> HDUs provides really useful new functionality.  There are other ways to
> group HDUs within a FITS file that would not require the HDUs in the group
> all be sequentially ordered, and would require no new types of HDUs.  For
> example, one could invent a convention that simply use the EXTNAME keyword
> itself to define the hierarchical structure of the HDUs in the file.  For
> example, (and this simply illustrates the concept, and is not fully
> developed proposal) a  FITS file might contain 6 HDUs which have the
> following EXTNAME values:
>
>         EXTNAME = 'IMAGE1'
>         EXTNAME = 'IMAGE2'
>         EXTNAME = '[IMAGE1]ExpMaP'
>         EXTNAME = '[IMAGE2]ExpMap'
>         EXTNAME = '[IMAGE1]BadPix'
>         EXTNAME = '[IMAGE2]BadPix'
>
> This defines 2 hierarchical groups, each consisting of a base image and
> associated exposure maps and bad pixel lists, but in this case there is no
> restriction that the HDUs in the group be physically adjacent in the FITS
> file, or even co-located within the same file.
>
This is an interesting proposal (though I'm not sure how one is supposed
to interpret EXTNAME references of extensions not present in the
same file, I guess I misundertand the point). But I do have a couple
problems
with it, one with the capabilities, and one with the presumption of the
one of the problems it (and some other proposals) solve.

It is flexible one sense, but it lacks a key element. That is if I have the
IMAGE1 extension, how do I find out what it "contains" (or if it
contains anything)? I must search the whole file to find out. If one
puts keywords in the header to point to the children, one has
just bought into the big disadvantage. But even so, I'd be more
inclined to find this a more useful mechanism than the first you proposed.
It does have the drawback of introducing incompatiblities for us with
existing data (we use EXTNAME now, and if we adopted this, we
would have to think about how to make older data mesh with data
in with regard to EXTNAME usage (which would not be an issue
with what I proposed). For example we group data using EXTVER
(perhaps a bad choice, but at the time, there didn't appear to be
much else to choose from). We have components of files using
EXTNAME = 'SCI', 'ERR', 'DQ'. If we use the above scheme,
these would be now '[IMSET1]SCI', [IMSET1]ERR', etc.

The other point is that I'm not sure how userful random ordering is
in FITS files. I tend to think that random ordering is most useful
when one is replacing or deleting elements while constructing
or modifying a file thus leading to a somewhat scrambled order.
But FITS is not well suited to such dynamic rearrangement.
(In other words, if FITS is going to require you to move most
of the data, might as well get the order right as well)
Relaxed ordering requirements is useful for some kinds of  problems
(say you want to put out the "group" keywords after you finish
with all the components so IMAGE1 comes after [IMAGE1]BadPix.
Or perhaps groups are interleaved with others for some processing
reason so I'd agree it has its uses.

The main driver for the XTENSION='FITS' proposal was not as good
working format (FITS is still best used for sequential or 'fixed' format
data
output--i.e., extensions of known number, size, and order).
But as a way of packaging data from standard pipelines that was
simply interpreted whether used as input for data analysis or
obtained from an archive.

Perry