ASCII table tricks

Rob Seaman seaman at noao.edu
Thu Sep 5 04:17:38 EDT 1996


Steve Allen asserts:

>I believe the standard is saying that the data fields themselves may
>only contain printable text, but that portions of the table outside
>the fields may contain other ASCII charcters.

Ed Groth says:

> Well, the standards writers certainly never intended that one play
> tricks and we certainly don't need people to look for loopholes and
> take advantage of them.

Steve counters with:

> Whether intentional or not the standards documents seem to have
> permitted a format which could make the data in ASCII tables much
> more accessible to everyone.  Is this a bad thing that must be
> squelched or a good thing that should be pursued?

I'll register my vote against squelching...

Another thread in this newsgroup also just touched (tangentially) on
the question of ambiguous FITS usage.  It seems to me that FITS can
be viewed as ambiguous in a couple of different ways.

The first is by intent.  As with other standards such as X11 and
PostScript, FITS mandates no strong usage policy of its own, but relies
on contingent standards (or software) to define the policies.  Examples
are Motif for X11 and EPSF for PostScript - while FITS relies on header
keyword conventions (for instance) that are put forth by the different
segments of the astronomical community, or de facto usage as defined
by AIPS, IRAF or whatever.

Like other standards, FITS can also be unintentionally ambiguous (and
hopefully only ambiguous and not internally inconsistent).  Of course,
there are also always examples of data that are simply non-conforming.

The usual way to allow for ambiguity is for software that writes data
structures to strictly adhere to the standard, but for software that
interprets the data structures to only loosely require adherence to
the standard.  Ideally a FITS reader would require only the loosest
possible self-consistent interpretation of the standard.

Question:  Does the robust functioning of a FITS table reader really
require that only printable ASCII characters appear in the data array?

The NOST standard is more ambiguous than the original paper (A&A Suppl
73, p365) about the question of non-printing characters.  While a little
maneuvering room is still available in Harten, et.al., it's hard to argue
that the authors' original intent was to allow non-printing ASCII anywhere
in the data array.  This surely would have been discussed in the paper -
precisely because it's a useful feature as Steve has pointed out.

However, I don't think we need be inordinately solicitous of the intent
of the standards writers (or rather, of the apparent lack of intent in
this case).  The astronomical community governs FITS, not the other way
around.  If the wording of the standard allows a particular usage, then
either the wording should be changed - or the usage should be accepted.

One could even use the golden rule of "once FITS, always FITS" to argue
that any ambiguity that creeps through is instantly incorporated into
the standard for all time.  If nothing else, this certainly argues for
wording every proposed change to the standard with extreme care.

But I think we should think twice about rejecting all "loopholes" out
of hand...  A lot of the FITS deliberations lately have risen out of
these same loopholes - or out of the seams between different parts of
the standard.  I'd argue that this is a sign of the maturity of FITS.
Issues that were left unaddressed by the early history of FITS are
now of increasing importance.  Without the elbow room provided by the
(intentional or unintentional) ambiguity of the standard, FITS would
be handcuffed when dealing with the current issues.

An alternate interpretation is that each change to the standard
should be as tightly constraining as possible, since it will always
be possible to relax the constraints later should that prove to be
desirable.  Perhaps including non-printing characters in FITS ASCII
tables is an example of this.

Note that even if the original tables specification is reaffirmed as
not allowing non-printing characters, adding this as a new feature is
entirely consistent with the rules of FITS.  No preexisting data would
be harmed.  Software might break, but it would likely break in the
politically accepted manner - by totally bombing, that is.  On the
other hand, it would also be unsurprising if current tables software
passed over non-printing characters with nary a burp.

Steve's suggestion appears to make ASCII tables much easier to use.
Non-printing characters, especially between fields if not in the fields
themselves, do not complicate the standard to any great extent.

Meanwhile, Lucio Chiappetti wonders:

> Is anyone using FITS ASCII tables at all nowadays, or not only BINTABLEs ?

Folks likely will continue to use ASCII tables for the same purposes
they originally did - for large catalogs and as an easily accessible
human readable format.  Note that ease of use was integral to the
original design and may certainly continue to distinguish ASCII from
binary FITS tables.  Also, not withstanding that Harten, et.al., state
that "...  the format is primarily intended to be used for the transfer
of information rather than the storage of information", ASCII tables
are likely to be preferred for archival applications.

Why am I showing any interest in this issue?  Well, it occurs to me that
by permitting non-printing ASCII field and record separators, it would be
immediately possible to store a /rdb database in a FITS table extension -
or rather, a legal FITS table could be operated on and maintained by this
relational database.

/rdb is a Unix tools style RDB that uses ASCII files with tab delimited
fields.  It can support surprisingly large databases - 800,000 image
headers from the NOAO archive occupy about 145Mb in one R&D database
we've constructed.  Separate binary index files can be generated as
needed.  By design, /rdb encourages access to the DB using tools such
as awk, sed, perl and the Unix shells.  This is precisely the same
functionality that Steve has suggested for FITS tables.

Note that while some scripting languages nominally support remappable
field and record delimiting characters (perhaps providing a way of
working with FITS tables using generic tools from the other end) -
these are only marginally useful in practice.  Embarrassingly, probably
the best known example of such usage for shell scripts is as the
prototypical setuid security hole.

Allowing easy user level FITS support for such text based tools would
indeed require permitting tabs, newlines and perhaps other ASCII
control characters into the data.  Is this feature worth the trouble?
I think so.  Is this extremely dangerous?  No.  Is this something
to worry about right now?  Opinions will differ.

On the other hand, constraining tabular data arrays (not only FITS)
to contain only printable ASCII makes many classes of applications
that rely on "out-of-band" data (non-printing control characters,
in the case of ASCII) impossible.

Rob
-- 
seaman at noao.edu, http://iraf.noao.edu/~seaman
NOAO, 950 N Cherry Ave, Tucson AZ 85719, 520-318-8248
PGP: 98 8D 8B 49 74 9A 41 88  3A 43 87 54 51 BF 30 4B




More information about the fitsbits mailing list