[fitsbits] XTENSION = 'FITS' proposal
Doug Tody
tody at tucana.tuc.noao.edu
Thu Apr 11 10:34:51 EDT 2002
Hi All -
We have yet another one of these here which has been implemented
and in use internally for several years. We call it "foreign file
encapsulation", and as Preben says it is used for observing logs and
other auxiliary information which we want to encapsulate in FITS and
propagate into our archive. The extension allows arbitrary files
or byte streams to be encapsulated in a FITS extension and later
regenerated, including restoring the original file name and MIME type.
I dug up some notes from back when we worked on this, and these are
appended below.
- Doug
On Thu, 11 Apr 2002, Preben Grosbol wrote:
> On Wednesday 10 April 2002 23:36, Perry Greenfield wrote:
> > Proposal for a "FITS" Extension (XTENSION = 'FITS')
>
> I have some sympathy for a proposal of this kind but it is too limited
> in scope. You may recall that there already is a reserved extension
> type XTENSION = 'DUMP' which was intended for binary dumps
> (ref. NOST 100-2.0). One may find a better name for such byte
> stream extensions (e.g. 'BSTREAM') but one would sometimes like
> to store additional information together with binary data in a FITS
> file. Typical examples would be:
> 1) Observing logs in e.g. XML format
> 2) Reduction logs in e.g. ASCII text
> 3) Binary dumps from an instrument
> 4) Graphs in e.g. PostScript format
> 5) A paper in e.g. LaTeX format
> 6) or as you mention another FITS file.
> This would call for an extension like:
>
> XTENSION = 'BSTREAM ' or 'DUMP '
> BITPIX = 8
> NAXIS = 1
> NAXIS1 = <size of byte stream in bytes>
> BSTYPE = 'data type'
>
> where 'data type' should be controlled and possibly have values
> like 'NONE', 'FITS', 'XML', 'POSTSCRIPT', 'LATEX', 'PDF', 'TEXT'
> etc. For 'TEXT' type, one could add a keyword to give the character
> encoding.
>
> This type of extension would be much more general but still serve
> the purpose of providing a hierarchical structure for your FITS files.
> Preben Grosbol
----
FITS FOREIGN FILE ENCAPSULATION
May 1999
Foreign File Extension
The new extension type puts a FITS wrapper about an arbitrary file, allowing
a file or tree of files to be wrapped up in FITS and later restored to
disk. This mechanism also provides a means for associating a group of
FITS extensions of any type. Certain of the file attribute keywords can
be included in the header of any FITS file or extension to support such
things as storing a directory tree containing images, tables, and other
non-FITS types of files in a FITS MEF file, and later restoring the whole
tree to disk.
File Types
text file
A file containing only text. Stored 8 bits per character using
newline to delimit lines of text (like Unix).
binary file
Any file which is not a text file or one of the known file types.
Stored as a byte stream without any conversion.
fits file
Any FITS file or FITS extension, regardless of the extension type.
This has to include MEF files as well.
directory
symlink
Hard links, special files, etc. are not recognized or supported (the
writer task might recognize these but would exclude them).
Output File Format
The output host file (or byte stream) is a conventional FITS file
consisting of a sequence of one or more FITS extensions, optionally
preceded by a dataless PHU describing the entire file. Writing of
the PHU may be disabled even if a file is being written to disk (e.g.
when writing a sequence of extensions to be concatenated).
Foreign files (text, binary, directory, symlink) are wrapped as single
extensions with XTENSION=FOREIGN. Single FITS images without extensions
are converted to IMAGE extensions, writing a single extension to the
output stream.
MEF files in the input are written unchanged except that keywords are
added to the first HDU to identify the MEF group (subsequent extensions
are merely copied to the output stream unchanged). If the first HDU in
the input file is a PHU it is converted to an IMAGE extension. The
order of the extensions in the output stream must match that in the
input MEF for the MEF to be later restored to disk. The PHU and all
extensions in the input MEF are still visible in the output file;
their association as an MEF grouping is evident only by examining the
FG keywords in the HDU. Any internal MEF associations, such as for
inheritance, are still present, but might not be recognized by most
software until the MEF group is later restored to a file.
By default the output stream will have a dataless PHU describing the
contents of the file (this can be disabled as mentioned above). The
PHU may optionally include a table of contents for the output file.
If a TOC is generated this will require that the output file list be
fully processed to determine the type and size of each input file,
before writing out the PHU with TOC followed by the input data files.
This might be desirable in any case to simplify the code (construction
of the input file list can be separated from file conversion and output).
Foreign File Extension
0 1 2 3 4
1234567890123456789012345678901234567890
XTENSION= 'FOREIGN '
BITPIX = 8
NAXIS = 0
PCOUNT = <filesize> / File size in bytes
GCOUNT = 1
EXTNAME = '<filename>'
EXTVER = 1
CHECKSUM= <checksum>
The extension name above is used only to identify the extension in
listings. To restore a file to disk the "FG" (file group) keywords
are used as outlined below.
Keywords
To be able to later unpack a FG stream and restore files to disk,
a number of keywords must be added to the extension headers to store
the information required to restore the files. These are the "FG"
keywords. The FG keywords are used in both "FOREIGN" type extensions
and in standard FITS extensions such as IMAGE, BINTABLE, and so on.
FG_GROUP
Each time a file group is written a group name is assigned. The
group name associates all of the elements of a group. Assuming
the group name is unique (no checks are made) then this can be
used to associate all the extensions in a group for later
restoration. This is useful if groups are concatenated in a larger
sequence of extensions. The group name is arbitrary (like a
filename) and is assigned by the user when the file group is
written. For example, a group name for a directory tree might be
the name of the root directory. It is up to the writer program
to assign a group name if the user does not predefine one.
FG_FNAME
The filename of the file associated with the current extension.
The maximum filename length is 67 characters. Any printable
character except apostrophe is permitted. For an extension of
type foreign where the file type is directory, FNAME is the name
of the directory.
FG_FTYPE
The physical file type ("text", "binary", "directory", or "symlink"),
or for a native FITS extension, the FITS type ("FITS" or "FITS-MEF").
In the case FITS-MEF, the EHU is the first element of a MEF group.
No count of the number of extensions is given, rather the MEF group
consists of all subsequent extensions until a EHU is encountered which
starts a new file.
FG_MTYPE
The logical or "mime" type of the file (optional).
FG_LEVEL
The directory nesting level. All of the files in a directory are
at the same level. Foreign extensions of type directory are used
TO NAME The directories at each level so that pathnames can be
reconstructed (this scheme assumes that the extensions in a file
group are ordered). Level 0 (zero) is the root directory of the
file group. The root directory is unnamed (but might be a logical
choice for the file group name).
FG_FSIZE
The size in bytes of the data portion of the file.
FG_FMODE
The file mode as a string ("rwx-rwx-rwx", bits not set given as "-").
FG_FUOWN
The file UID (user ID) as the file owner name string.
FG_FUGRP
The file GID (group ID) as the file group name string.
FG_CTIME
The file creation time in seconds since 1970.0 GMT.
FG_MTIME
The file modification time in seconds since 1970.0 GMT.
FG_COMP
This keyword will not be used initially, but is reserved in case
we choose to implement file (e.g. gzip) compression in the archiver.
The value would be a string such as "none" or "gzip". In the
meantime files can be archived in compressed form by compressing
them beforehand and archiving the compressed files as binary files.
Part of the reason we are reluctant to implement compression in
the archiver is that archive data may last indefinitely and it is
hard to guarantee that the compressed data will be readable a
decade or two in the future. We might need to avoid compression for
archival data unless the compression algorithms and/or code are
part of the archive as well. (This discussion refers only to
foreign files, not to compressed images).
When a file group is restored to disk the foreign file extensions will
disappear. The FG keywords in the data extensions may be removed.
Any FG keywords in the input file with the same names as the keywords
above will be replaced.
Task Specification
Initially the FG reader/writer programs will be host level, as part
of the new DHS system, using the existing KWDB interface for FITS keyword
manipulation. Parts of the IRAF HSI, e.g. bootlib and libos, will
probably be used for things such as following a directory tree.
The Unix versions of the tasks will be disk file oriented, not tape
oriented.
Native IRAF versions of these tasks may follow later, so that we can
make use of IRAF magtape i/o and support IRAF images. This is really
a separate problem though. For encapsulating foreign files for the
archive, host level tasks similar to the existing HSI wtar/rtar are
more what is needed.
Sample syntax:
fgwrite <flags> <input-file-template-list>
fgread <flags> <input-file>
We don't need to try to make a completely general file archiver here.
The intention is mainly to be able to use FITS to carry along and
archive some non-FITS auxiliary data. A secondary goal is to generalize
our FITS writers somewhat so that directories can be handled (archived
and later restored) as well as linear file templates.
Since the task will not be a completely general file archiver, we can
omit certain details:
symlinks to directories are not followed by the writer
unlike tar, hard links are not preserved
special files are ignored
Selected task options:
Input-file-template-list is a sequence of file names or directory
names (if it is a unix task, any templates will already have been
expanded by the shell).
There should be an option to fgwrite specify the types of files to
be archived; when descending a directory, a file list along will not
handle this. Hence some mechanism such as which of the possible
supported file types (tbdsf), or a pattern matching template such
as in "find -name", would be used to select the files to be archived.
More information about the fitsbits
mailing list