[fitsbits] XTENSION = 'FITS' proposal

Doug Tody tody at tucana.tuc.noao.edu
Thu Apr 11 10:34:51 EDT 2002


Hi All -

We have yet another one of these here which has been implemented
and in use internally for several years.  We call it "foreign file
encapsulation", and as Preben says it is used for observing logs and
other auxiliary information which we want to encapsulate in FITS and
propagate into our archive.  The extension allows arbitrary files
or byte streams to be encapsulated in a FITS extension and later
regenerated, including restoring the original file name and MIME type.
I dug up some notes from back when we worked on this, and these are
appended below.

	- Doug


On Thu, 11 Apr 2002, Preben Grosbol wrote:

> On Wednesday 10 April 2002 23:36, Perry Greenfield wrote:
> > Proposal for a "FITS" Extension (XTENSION = 'FITS')
> 
> I have some sympathy for a proposal of this kind but it is too limited
> in scope.  You may recall that there already is a reserved extension
> type XTENSION = 'DUMP' which was intended for binary dumps
> (ref. NOST 100-2.0).  One may find a better name for such byte
> stream extensions (e.g. 'BSTREAM') but one would sometimes like
> to store additional information together with binary data in a FITS
> file.  Typical examples would be:
>    1) Observing logs in e.g. XML format
>    2) Reduction logs in e.g. ASCII text
>    3) Binary dumps from an instrument
>    4) Graphs in e.g. PostScript format
>    5) A paper in e.g. LaTeX format
>    6) or as you mention another FITS file.
 
> This would call for an extension like:
> 
>   XTENSION = 'BSTREAM '  or 'DUMP    '
>   BITPIX = 8
>   NAXIS = 1
>   NAXIS1 = <size of byte stream in bytes>
>   BSTYPE = 'data type'
> 
> where 'data type' should be controlled and possibly have values
> like 'NONE', 'FITS', 'XML', 'POSTSCRIPT', 'LATEX', 'PDF', 'TEXT'
> etc.  For 'TEXT' type, one could add a keyword to give the character
> encoding.
> 
>  This type of extension would be much more general but still serve
> the purpose of providing a hierarchical structure for your FITS files.
 
> Preben Grosbol




----
		       FITS FOREIGN FILE ENCAPSULATION
				  May 1999


Foreign File Extension

The new extension type puts a FITS wrapper about an arbitrary file, allowing
a file or tree of files to be wrapped up in FITS and later restored to
disk.  This mechanism also provides a means for associating a group of
FITS extensions of any type.  Certain of the file attribute keywords can
be included in the header of any FITS file or extension to support such
things as storing a directory tree containing images, tables, and other
non-FITS types of files in a FITS MEF file, and later restoring the whole
tree to disk.


File Types

    text file
	A file containing only text.  Stored 8 bits per character using
	newline to delimit lines of text (like Unix).

    binary file
	Any file which is not a text file or one of the known file types.
	Stored as a byte stream without any conversion.

    fits file
	Any FITS file or FITS extension, regardless of the extension type.
	This has to include MEF files as well.

    directory
    symlink

    Hard links, special files, etc. are not recognized or supported (the
    writer task might recognize these but would exclude them).


Output File Format

    The output host file (or byte stream) is a conventional FITS file
    consisting of a sequence of one or more FITS extensions, optionally
    preceded by a dataless PHU describing the entire file.  Writing of
    the PHU may be disabled even if a file is being written to disk (e.g.
    when writing a sequence of extensions to be concatenated).

    Foreign files (text, binary, directory, symlink) are wrapped as single
    extensions with XTENSION=FOREIGN.  Single FITS images without extensions
    are converted to IMAGE extensions, writing a single extension to the
    output stream.

    MEF files in the input are written unchanged except that keywords are
    added to the first HDU to identify the MEF group (subsequent extensions
    are merely copied to the output stream unchanged).  If the first HDU in
    the input file is a PHU it is converted to an IMAGE extension.  The
    order of the extensions in the output stream must match that in the
    input MEF for the MEF to be later restored to disk.  The PHU and all
    extensions in the input MEF are still visible in the output file;
    their association as an MEF grouping is evident only by examining the
    FG keywords in the HDU.  Any internal MEF associations, such as for
    inheritance, are still present, but might not be recognized by most
    software until the MEF group is later restored to a file.

    By default the output stream will have a dataless PHU describing the
    contents of the file (this can be disabled as mentioned above).  The
    PHU may optionally include a table of contents for the output file.
    If a TOC is generated this will require that the output file list be
    fully processed to determine the type and size of each input file,
    before writing out the PHU with TOC followed by the input data files.
    This might be desirable in any case to simplify the code (construction
    of the input file list can be separated from file conversion and output).


Foreign File Extension

    0        1         2         3         4
    1234567890123456789012345678901234567890

    XTENSION= 'FOREIGN '
    BITPIX  =                    8
    NAXIS   =                    0
    PCOUNT  =           <filesize>  / File size in bytes
    GCOUNT  =                    1
    EXTNAME = '<filename>'
    EXTVER  =                    1
    CHECKSUM= <checksum>

    The extension name above is used only to identify the extension in
    listings.  To restore a file to disk the "FG" (file group) keywords
    are used as outlined below.


Keywords

    To be able to later unpack a FG stream and restore files to disk,
    a number of keywords must be added to the extension headers to store
    the information required to restore the files.  These are the "FG"
    keywords.  The FG keywords are used in both "FOREIGN" type extensions
    and in standard FITS extensions such as IMAGE, BINTABLE, and so on.

    FG_GROUP
	Each time a file group is written a group name is assigned.  The
	group name associates all of the elements of a group.  Assuming
	the group name is unique (no checks are made) then this can be
	used to associate all the extensions in a group for later
	restoration.  This is useful if groups are concatenated in a larger
	sequence of extensions.  The group name is arbitrary (like a
	filename) and is assigned by the user when the file group is
	written.  For example, a group name for a directory tree might be
	the name of the root directory.  It is up to the writer program
	to assign a group name if the user does not predefine one.

    FG_FNAME
	The filename of the file associated with the current extension.
	The maximum filename length is 67 characters.  Any printable
	character except apostrophe is permitted.  For an extension of
	type foreign where the file type is directory, FNAME is the name
	of the directory.

    FG_FTYPE
	The physical file type ("text", "binary", "directory", or "symlink"),
	or for a native FITS extension, the FITS type ("FITS" or "FITS-MEF").
	In the case FITS-MEF, the EHU is the first element of a MEF group.
	No count of the number of extensions is given, rather the MEF group
	consists of all subsequent extensions until a EHU is encountered which
	starts a new file.

    FG_MTYPE
	The logical or "mime" type of the file (optional).

    FG_LEVEL
	The directory nesting level.  All of the files in a directory are
	at the same level.  Foreign extensions of type directory are used
	TO NAME The directories at each level so that pathnames can be
	reconstructed (this scheme assumes that the extensions in a file
	group are ordered).  Level 0 (zero) is the root directory of the
	file group.  The root directory is unnamed (but might be a logical
	choice for the file group name).

    FG_FSIZE
	The size in bytes of the data portion of the file.

    FG_FMODE
	The file mode as a string ("rwx-rwx-rwx", bits not set given as "-").

    FG_FUOWN
	The file UID (user ID) as the file owner name string.

    FG_FUGRP
	The file GID (group ID) as the file group name string.

    FG_CTIME
	The file creation time in seconds since 1970.0 GMT.

    FG_MTIME
	The file modification time in seconds since 1970.0 GMT.

    FG_COMP
	This keyword will not be used initially, but is reserved in case
	we choose to implement file (e.g. gzip) compression in the archiver.
	The value would be a string such as "none" or "gzip".  In the
	meantime files can be archived in compressed form by compressing
	them beforehand and archiving the compressed files as binary files.
	Part of the reason we are reluctant to implement compression in
	the archiver is that archive data may last indefinitely and it is
	hard to guarantee that the compressed data will be readable a
	decade or two in the future.  We might need to avoid compression for
	archival data unless the compression algorithms and/or code are
	part of the archive as well.  (This discussion refers only to
	foreign files, not to compressed images).

    When a file group is restored to disk the foreign file extensions will
    disappear.  The FG keywords in the data extensions may be removed.
    Any FG keywords in the input file with the same names as the keywords
    above will be replaced.



Task Specification

    Initially the FG reader/writer programs will be host level, as part
    of the new DHS system, using the existing KWDB interface for FITS keyword 
    manipulation.  Parts of the IRAF HSI, e.g. bootlib and libos, will
    probably be used for things such as following a directory tree.
    The Unix versions of the tasks will be disk file oriented, not tape
    oriented.

    Native IRAF versions of these tasks may follow later, so that we can
    make use of IRAF magtape i/o and support IRAF images.  This is really
    a separate problem though.	For encapsulating foreign files for the
    archive, host level tasks similar to the existing HSI wtar/rtar are
    more what is needed.

    Sample syntax:

	fgwrite <flags> <input-file-template-list>
	fgread  <flags> <input-file>

    We don't need to try to make a completely general file archiver here.
    The intention is mainly to be able to use FITS to carry along and
    archive some non-FITS auxiliary data.  A secondary goal is to generalize
    our FITS writers somewhat so that directories can be handled (archived
    and later restored) as well as linear file templates.

    Since the task will not be a completely general file archiver, we can
    omit certain details:

	symlinks to directories are not followed by the writer
	unlike tar, hard links are not preserved
	special files are ignored

    Selected task options:

	Input-file-template-list is a sequence of file names or directory
	names (if it is a unix task, any templates will already have been
	expanded by the shell).

	There should be an option to fgwrite specify the types of files to
	be archived; when descending a directory, a file list along will not
	handle this.  Hence some mechanism such as which of the possible 
	supported file types (tbdsf), or a pattern matching template such
	as in "find -name", would be used to select the files to be archived.




More information about the fitsbits mailing list