[fitsbits] Re: FITS Bintable proposals
William Pence
William.D.Pence at nasa.gov
Tue Dec 14 16:37:29 EST 2004
The following comments about the FITS binary table proposals from Arnold
Rots and Preben Grosbol may be of general interest, so I'm reposting this
here to the wider FITSBITS audience. I have updated the draft proposals,
available from http://fits.gsfc.nasa.gov/bintable_proposals.html with the
changes that are discussed here. -Bill Pence
---------------------------------------------------------------
Arnold Rots wrote:
> OK, I read the binary table proposals
> and have some comments/questions.
>
> The last sentence of the changes brought about by Proposal 1:
> "The meaning of these bytes is defined in section 8.3.5."
> The standard currently says "...One proposed application is described
> in Appendix B.1."
> Does this change mean that we will now restrict PCOUNT to use by
> Variable Length Arrays only?
> That's OK, but probably should be said explicitly.
This was an oversight on my part; I had not intended to restrict other
possible uses of PCOUNT and the heap storage area. In principle there is no
reason 2 or more conventions could not share the heap space (just as the
memory space managed by 'malloc' can be used for multiple purposes). Note
also that currently it is legal to have a binary table with a non-zero heap
(PCOUNT > 0) but without any 'P' variable length columns in the table; in
that case the meaning of the heap data is currently undefined. The proposal
draft has been modified so that section 8.3.3.2, in its entirety, will read:
"8.3.3.2 Bytes Following Main Table
The main data table shall be followed by an additional data area
containing zero or more bytes, as specified by the value of the
PCOUNT keyword. One use for this data area is described in section
8.3.5. This does not preclude other uses for these bytes."
> In the text of 8.3.5:
> Second par.: it says tables can be read by programs not understanding
> VLAs (:-). That's correct, but it may be good to point out that they'd
> better know about applying PCOUNT.
The value of PCOUNT should not affect how programs read the main data table,
and it is only necessary to apply PCOUNT when calculating the total number
of bits in the binary table extension and hence the starting location of the
next HDU. This is already stated in section 5.4.1.2 on Conforming
Extensions; given the legalistic style of the FITS Standard (it is not a
verbose User's Guide) is is probably better to leave the current wording as
is and not duplicate the same requirement in more than one section.
> I would prefer to refer to data consistently as plural:
> P 1, par 3, l 3: "...data are not stored..."
> P 2, par 2, l 1-2: "...a table are not stored...records; they are stored..."
> There may be more...
These and a few other cases have been corrected, so now "data" is always
used as plural.
> P 2, par 5: Twice it says "NAXIS x NAXIS2"; this should be
> "NAXIS1 x NAXIS2"
This strange 'cut and paste' typo has been corrected.
> One thing is not clear about PCOUNT: is it (gap + heap) or (gap + heap
> + padding)? The latter seems implied by the example in the next par.
PCOUNT does not include any padding bytes needed to make the length of the
data unit a multiple of 2880 bytes. I've changed the example to use a heap
size of 3000 (instead of 2880) to make this clear.
> P 2, par 6, l 1: "...5 rows of each 168..."
> ^^
This has been reworded.
> I would also like to see a better definition of PCOUNT in the example.
This has been reworded and hopefully is now clearer.
> Re-reading all of this made me realize that, in retrospect, I am
> uncomfortable with the 32 bit signed restriction. Here we start
> worrying about 64 bit integers but we restrict the size of the heap to
> 2 GB through the second half of the P fields.
> But I should not reopen the debate :-)
Some reasons for restricting the array length and offset to signed
integers are:
- there is no precedent in FITS for using unsigned 32-bit integers
- use of unsigned integers is problematic in some languages like Fortran
- as far as I'm aware, the current software implementations of the heap
(e.g. CFITSIO) interpret these values as signed integers
I share your discomfort about this, however, and think that perhaps in the
future we could reverse this decision and redefine these to be unsigned
integers. (This is possible because doing so would not invalidate any
existing FITS files and thus would not violate the "once FITS, always FITS"
rule). We need more time to evaluate all the implications before making
this decision, so for now I think it is best to restrict these fields to be
signed integers, but leave the door open for a change in the future. (see
also Preben's related comment, below)
> As to the TDIMn replacement for 8.3.2, I cringe a bit at the thought
> of the opportunities for abuse of combining a TDIM with a P TFORM.
> Would it be helpful (again, without reopening the discussion) to give
> an indication of what usage is envisioned here?
I'm not sure what else needs to be added, since this is the FITS standard
and not a user's guide. There are perhaps 2 main uses of TDIMn with VLAs:
Case 1. Every row of the VLA column contains the same size/shape image,
except for some rows where the array has zero length (is not present).
This may happen if an image is optional, and not necessarily applicable to
every row of the table In this case the TDIMn keyword is used normally to
give the dimensions of the image, when it is present. The TDIMn keyword is
ignored for those rows with no image.
2. If the VLA column contains a different size/shape image in each row,
then then the TDIMn keyword cannot be used, and instead the TDIMn value for
each row would be given in another column of the table (whose name is
'TDIMn' where n is the number of the VLA column). This follows the
'Greenbank Convention' for collapsing a constant table column into a
keyword, or expanding a keyword whose value varies from row to row into a
column.
============================================
In a separate message on 14-Dec, Preben Grosbol wrote:
> I found only one other issue namely the first modification of VLA which
> explicit include P column in the types effected by TNULLn. My two
> comments are:
> 1) Since specifying a zero array length has the same effect, TNULL
> values in P columns are not needed.
That will work only if the entire array is null; we still need to use TNULLn
to define a null value if only some elements of the variable length array
are null.
> 2) We have signed integer so in principle we should explicitly define
> that VLA elements with either length or offset being negative are
> regarded as undefined values.
The following sentence has been added to the draft proposal: "The meaning
of a negative value for either of these integers is undefined by this
standard." This leaves an opening for future experimentation with using
unsigned integers (large unsigned integers are equivalent to negative signed
integers) in these fields.
--
____________________________________________________________________
Dr. William Pence William.D.Pence at nasa.gov
NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice)
Greenbelt MD 20771 +1-301-286-1684 (fax)
More information about the fitsbits
mailing list