[fitsbits] Proposed Changes to the FITS Standard
Rob Seaman
seaman at noao.edu
Sun Aug 19 13:15:44 EDT 2007
Perhaps there is a consensus building to back off from an absolute
ban on duplicate keywords. Here's my reply to Bill's latest in any
event - kind of a "Tao of FITS". Apologies for the length:
> - keyword values are restricted to be a single value, not an array
> - logical keyword values must consist of a single T or F followed
> only by a space or a slash character
> - integer and float keyword values must not contain embedded spaces
> - complex keyword values must be enclosed in parentheses
> - no other keywords may intervene between the mandatory keywords in
> the primary array or extension
> - the TFORM keyword values must be upper case (e.g., F5.2, not f5.2)
These are all (except perhaps the last) rare occurrences. In that
case, a newly placed requirement is more like a clarification of the
standard than a change. Duplicate keywords, on the other hand, are a
frequent occurrence (thus the interest in eliminating them :-) "Once
FITS, always FITS" may never have been in serious question before.
> Imposing a new requirement on software systems to read the last
> instance of the keyword would likely have a lot of negative
> repercussions.
No more so than imposing a new requirement to detect and act on
duplicate keywords. The difference is that outlawing duplicates
doesn't fix those systems reading indeterminate values.
We might ask what an ideal FITS library or application should do on
encountering various exceptions. Whether or not duplicate keywords
are outlawed, deprecated or ignored, feeding such input to our
software will remain a frequent event. We can't legislate moral
behavior, rather only the consequences for detected immorality.
It seems to me that in this imperfect world it would be better if the
major FITS software packages adopted a coherent behavior on
encountering duplicate keywords. A header with duplicate FITS
keywords is not a bug. Currently, it is perfectly legal FITS, if
questionable practice. This cannot now be rescinded (except with
some form of HDU-level versioning, I still assert). But even if
duplicate keywords were illegal FITS, the question remains of what
FITS software should do upon encountering them - and how our code
should recognize the fact in the first place.
> Requiring all software systems to follow the same behavior is not
> practical, so the only sure way to prevent users from getting an
> incorrect result when analyzing the file is to eliminate duplicate
> keywords in the first place.
You cannot avoid the question of what software is required to do by
outlawing data. Those data can and will continue to be presented as
input to our software. Perhaps there is some notion that we'll
require all archives and data providers to scrub their data. This is
at least as impractical a requirement as you describe for the
software - more to the point, who will data providers turn to for
software to perform such scrubbing? One way or the other, if we
tackle this issue our software will have to detect duplicate keyword
instances and take some action as a result.
> There is less harm if the duplicated keywords all have the same
> value, so maybe the wording of this requirement should be modified
> to take this into account.
This strikes me as the sort of contingent action that indicates the
primary action is ill conceived. As far as the software, it is
simply another requirement placed on top of the first. Look for
duplicate keyword names, then look for duplicate values - would the
next step be a test for duplicate comments?
> some of your FOREIGN extensions have the order of these 2 keywords
> reversed.
We'll look into the behavior you describe. I would expect most
extension types, including FOREIGN, to be conformable to this more
strict keyword ordering whether it is required or merely preferred.
In addition to clarifying the ordering of PCOUNT/GCOUNT, this may be
a good time to state this more clearly for all the mandatory keywords
(section 4.4). In particular, the ordering of NAXISn is never
explicitly restricted to increasing numerical order. The only
statement for any of the mandatory keywords is presented in table
4.5, which suggests NAXISn be ordered, but never outright says it.
>>> 3. Embedded space characters are now forbidden within numeric
>>> values in an ASCII Table (e.g. "1 23 4.5" is no longer
>>> allowed to represent the decimal value 1234.5)
>>
>> Again - are there any examples of such usage in the field?
>
> No, as far as we know. If there are any, then it is very likely
> that most current software systems do not support embedded spaces
> in the value and will silently read an incorrect value, or will
> exit with an error. Thus, it seems better to me to outlaw this
> usage rather than just not recommend it or deprecate it.
Again, the question is whether it is more productive to attempt to
outlaw something or to describe what steps software should take upon
encountering the usage. If there are no known instances, "outlawing"
is equivalent to clarifying the standard. This is likely such a
case. If there are many instances, I don't think we can escape from
taking a position on what the software should do.
> I don't really see any practical benefit to having a version
> keyword. Either the software will support a new requirement, or it
> won't; the presence of a version (or DATE) keyword isn't really
> helpful, except maybe to a human reading the header.
I don't understand. The software would interpret the version to know
if the new requirement should be enforced for a particular HDU. In
the absence of such versioning (by token or date), the software has
to follow some sloppy heuristic to let the nuances of the data guide
its behavior. The other two new requirements on the table strike me
as clarifications and can go forward without versioning, perhaps with
some tweaking of the language. I'm not sure about the EXTEND keyword.
I'm not a big fan of introducing versioning myself, but the clear
implication of avoiding versioning is that duplicate keywords cannot
be gracefully banned after the fact. In fact, consider a situation
in which the choice had been made to ban them back in the FITS
Dreamtime - exactly the same stringent software requirements would
pertain to detect instances and take application dependent action.
Our libraries and applications would be more complex now as a
result. (Arguably better, but certainly more complex.) Banning
duplicates doesn't avoid significant new software requirements, it
mandates them.
> The proposed new statement ("Existing FITS files that conformed to
> the latest version of the standard at the time the files were
> created are expressly exempt from any new requirements imposed by
> subsequent versions of the standard.") is, I think, mainly intended
> as a political statement to reassure institutions that the FITS
> committees are not imposing new unfunded mandates that require
> modifications to existing FITS archives. I don't see this
> statement as having much relevance to the way software is implemented.
You can't avoid the unfunded mandate this way. Any software seeking
to follow the letter of the standard would still have to detect
instances of duplicate keywords and take some action. What
statements like this do is to encourage folks to treat the standard
as some floppy set of guidelines and conformance to the standard as
an optional nicety for polite society.
A file either conforms to the FITS standard or it does not. A ban on
duplicate keywords is unenforceable unless it is paired with
versioning. The statement above would fail to impress a lawyer since
it isn't paired with a way for either humans or computers to
determine which files were grandfathered in. Further, there is a
sense of legal entrapment in promulgating such a new requirement with
no realistic way to encourage instrument teams and others to redesign
their systems to avoid duplicates. For instance, the ICE/ccdacq
software permits observers to enter their own file of keywords,
perhaps including duplicates. Users can trivially use IRAF hedit to
add duplicates, etc. Perhaps there is no way to duplicate a keyword
with CFITSIO? Who would enforce the ban?
In any event, the FITS standard should be kept free of political
statements.
> This is missing the main point of this new requirement. No current
> software system that I am aware of (except for the FITS verifier
> code) checks for duplicated keywords, so users have no idea which
> of the duplicated keywords is being used by a particular program.
> The software might be using the first, the 'next', or the last
> instance of the keyword.
Well, as I said, iSTB throws an error if duplicate structural
keywords are encountered. After 10 million files, I don't think I've
ever seen this particular error in BITPIX, NAXISn, PCOUNT, GCOUNT or
EXTENSION. We did just happen to see duplicate SIMPLE keywords while
commissioning a new instrument. The problem was detected, reported
and fixed. On the other hand, there are numerous ongoing examples of
duplicated user keywords. It seems to me that applications should
only be sensitive to header abnormalities that affect their own
functionality.
Instituting an absolute ban is meaningless unless all our software
systems become aware of all possible duplicates. We can't just dump
the responsibility on the users to avoid creating them in the first
place unless our own software that they are using to create or update
the HDUs aids in that task.
This ban is attempting to avoid placing natural requirements on
software by placing unnatural ones on the data. Not only is it
unenforceable - the software requirements just pop up again elsewhere.
> This could easily cause the user to derive incorrect scientific
> results. What is the best way to prevent this from happening?
This is the heart of the matter. As Dick says, there is no single
simple solution. We should encourage data providers (and users) to
avoid duplicate keywords. We should understand why such keywords may
be created in the first place. Our major software packages should
reach agreement on a common strategy should duplicates be encountered
- whether this is that the behavior remain indeterminate, or the
first instance or the last instance take precedent. Applications
should detect duplicates which affect their functionality as with any
other header peculiarities. Libraries should provide routines and
utility programs for validating HDUs against a wide variety of
exceptions, including duplicate keywords.
A duplicated keyword is just one of a long list of poor header
construction techniques that can't be fixed simply by demanding they
not occur.
> Seems to me we should focus on the root of the problem and
> (formally at least) disallow duplicated keywords in a conforming
> FITS file. This doesn't mean software should automatically throw
> out a file that inadvertently has a duplicated keyword.
"Formal" is the essence of a standard. I guess the notion is that
deprecation hasn't proven strong enough so perhaps an absolute ban
might do the trick? In the absence of practical consequences, what
this really does is call the integrity of the standard into question.
> I think the seriousness of this problem depends on what keyword is
> duplicated. If it is just some observatory-specific keyword that
> does not directly affect the scientific results, then it does not
> matter very much, and data providers need not worry about it. But
> if a critical WCS keyword, or exposure time keyword is duplicated
> in the file with different values, then surely the data providers
> need to take responsibility and fix the problem.
Whether the issue is duplicate keywords or some other keyword
misformatting, there is more pressure on the data providers already
to fix significant occurrences than this technical change to the FITS
standard would apply. On the other hand, for the much more frequent
case of unintentionally duplicating some non-critical keyword, this
change would be outlawing files for no benefit and a lot of
annoyance. In either case, the software faces exactly the same
requirements.
Rob
More information about the fitsbits
mailing list