[fitsbits] Re: FITS Bintable proposals

Tue Dec 14 16:37:29 EST 2004

The following comments about the FITS binary table proposals from Arnold 
Rots and Preben Grosbol may be of general interest, so I'm reposting this 
here to the wider FITSBITS audience.  I have updated the draft proposals, 
available from http://fits.gsfc.nasa.gov/bintable_proposals.html with the 
changes that are discussed here.  -Bill Pence
---------------------------------------------------------------

Arnold Rots wrote:
> OK, I read the binary table proposals
> and have some comments/questions.
>
> The last sentence of the changes brought about by Proposal 1:
> "The meaning of these bytes is defined in section 8.3.5."
> The standard currently says "...One proposed application is described
> in Appendix B.1."
> Does this change mean that we will now restrict PCOUNT to use by
> Variable Length Arrays only?
> That's OK, but probably should be said explicitly.

This was an oversight on my part; I had not intended to restrict other 
possible uses of PCOUNT and the heap storage area.  In principle there is no 
reason 2 or more conventions could not share the heap space (just as the 
memory space managed by 'malloc' can be used for multiple purposes). Note 
also that currently it is legal to have a binary table with a non-zero heap 
(PCOUNT > 0) but without any 'P' variable length columns in the table; in 
that case the meaning of the heap data is currently undefined.  The proposal 
draft has been modified so that section 8.3.3.2, in its entirety, will read:

   "8.3.3.2  Bytes Following Main Table
    The main data table shall be followed by an additional data area
    containing zero or more bytes, as specified by the value of the
    PCOUNT keyword.  One use for this data area is described in section
    8.3.5.  This does not preclude other uses for these bytes."

> In the text of 8.3.5:
> Second par.: it says tables can be read by programs not understanding
> VLAs (:-). That's correct, but it may be good to point out that they'd
> better know about applying PCOUNT.

The value of PCOUNT should not affect how programs read the main data table,
and it is only necessary to apply PCOUNT when calculating the total number
of bits in the binary table extension and hence the starting location of the
next HDU.  This is already stated in section 5.4.1.2 on Conforming
Extensions; given the legalistic style of the FITS Standard (it is not a
verbose User's Guide) is is probably better to leave the current wording as
is and not duplicate the same requirement in more than one section.

> I would prefer to refer to data consistently as plural:
> P 1, par 3, l 3: "...data are not stored..."
> P 2, par 2, l 1-2: "...a table are not stored...records; they are stored..."
> There may be more...

These and a few other cases have been corrected, so now "data" is always
used as plural.

> P 2, par 5: Twice it says "NAXIS x NAXIS2"; this should be
> "NAXIS1 x NAXIS2"

This strange 'cut and paste' typo has been corrected.

> One thing is not clear about PCOUNT: is it (gap + heap) or (gap + heap
> + padding)?  The latter seems implied by the example in the next par.

PCOUNT does not include any padding bytes needed to make the length of the
data unit a multiple of 2880 bytes.  I've changed the example to use a heap
size of 3000 (instead of 2880) to make this clear.

> P 2, par 6, l 1: "...5 rows of each 168..."
>                             ^^

This has been reworded.

> I would also like to see a better definition of PCOUNT in the example.

This has been reworded and hopefully is now clearer.

> Re-reading all of this made me realize that, in retrospect, I am
> uncomfortable with the 32 bit signed restriction.  Here we start
> worrying about 64 bit integers but we restrict the size of the heap to
> 2 GB through the second half of the P fields.
> But I should not reopen the debate :-)

Some reasons for restricting the array length and offset to signed
integers are:
  - there is no precedent in FITS for using unsigned 32-bit integers
  - use of unsigned integers is problematic in some languages like Fortran
  - as far as I'm aware, the current software implementations of the heap
    (e.g. CFITSIO) interpret these values as signed integers

I share your discomfort about this, however, and think that perhaps in the 
future we could reverse this decision and redefine these to be unsigned 
integers.  (This is possible because doing so would not invalidate any 
existing FITS files and thus would not violate the "once FITS, always FITS" 
rule).   We need more time to evaluate all the implications before making 
this decision, so for now I think it is best to restrict these fields to be 
signed integers, but leave the door open for a change in the future. (see 
also Preben's related comment, below)

> As to the TDIMn replacement for 8.3.2, I cringe a bit at the thought
> of the opportunities for abuse of combining a TDIM with a P TFORM.
> Would it be helpful (again, without reopening the discussion) to give
> an indication of what usage is envisioned here?

I'm not sure what else needs to be added, since this is the FITS standard 
and not a user's guide.  There are perhaps 2 main uses of TDIMn with VLAs:

Case 1.  Every row of the VLA column contains the same size/shape image, 
except for some rows where the array has zero length (is not present). 
This may happen if an image is optional, and not necessarily applicable to 
every row of the table  In this case the TDIMn keyword is used normally to 
give the dimensions of the image, when it is present.  The TDIMn keyword is 
ignored for those rows with no image.

2.  If the VLA column contains a different size/shape image in each row, 
then then the TDIMn keyword cannot be used, and instead the TDIMn value for 
each row would be given in another column of the table (whose name is 
'TDIMn' where n is the number of the VLA column).  This follows the 
'Greenbank Convention' for collapsing a constant table column into a 
keyword, or expanding a keyword whose value varies from row to row into a 
column.

============================================
In a separate message on 14-Dec, Preben Grosbol wrote:

> I found only one other issue namely the first modification of VLA which
> explicit include P column in the types effected by TNULLn.  My two
> comments are:
>   1) Since specifying a zero array length has the same effect, TNULL
>       values in P columns are not needed.

That will work only if the entire array is null; we still need to use TNULLn
to define a null value if only some elements of the variable length array
are null.

>   2) We have signed integer so in principle we should explicitly define
>       that VLA elements with either length or offset being negative are
>       regarded as undefined values.

The following sentence has been added to the draft proposal:  "The meaning 
of a negative value for  either of these integers is undefined by this 
standard."  This leaves an opening for future experimentation with using 
unsigned integers (large unsigned integers are equivalent to negative signed 
integers) in these fields.

-- 
____________________________________________________________________
Dr. William Pence                          William.D.Pence at nasa.gov
NASA/GSFC Code 662         HEASARC         +1-301-286-4599 (voice)
Greenbelt MD 20771                         +1-301-286-1684 (fax)