[fitsbits] Questions about the 'REFERENC' keyword.

Wed Jan 30 11:37:24 EST 2013

On Wed, 30 Jan 2013, Eric Greisen wrote:

> Since no one has responded to your question, I will give an answer of 
> sorts.

I waited for somebody else to show up before giving my opinions

> Joe Hourcle wrote:

>> 	3. A reference (citation, URL, DOI or bibcode) to a published
>> 	   research article that uses the data.
>> 	4. A URL to a website with documentation on using the
>> 	   data

> When the keyword was invented, only one of the concepts listed above was
> even conceivable - a published article citation.

I tend to agree with Eric. I.e. case (3). A literature reference, a 
bibcode or a DOI should be long-lived. The bibcode or DOI could be 
translated into an URL prepending the name of some server (like e.g. one 
of the present ADS sites), but we have no guarantee these particular 
servers will exist if one will look at the file 20 or 100 years in the 
future (while the bibcode or DOI could probably be used to lookup 
elsewhere). Of course if the expected lifespan of the file is much less, 
no problem in using an URL.

I believe case (4) may also be acceptable, although I'd expect such 
information to show up in comments (if at all).

>> 	1. It's similar to the 'FITS Serialization' in VOTable, where you
>> 	   don't have the data attached, and can instead give a URL:
>> 	2. A URL to the archive or repository are available to
>> 	   download from
>>

Not really clear to me ... (3) and (4) point to a document which REFERS TO 
(INTO ?) the current file. But (1) and (2) seems to refer OUT of the 
current file to elsewhere !

> I do understand the need for a modest sized data "catalog" to allow
> users to browse and then select the data files for downloading.

Hmm, my way to do this sort of things would be (actually is, I have a 
couple of survey databases organized that way) to have a database 
somewhere ('the catalog') and associated data products linked to a 
particular database column.

If I do a query, it will return a list of "objects", and a list of 
distinct available data products (in forms of URLs). The number of data 
products need not to be related to the n records returned by the query.

For instance if the dataproduct is a thumbnail image around the object, or 
a spectrum, there will be one for each of the n objects, indexed on the 
object sequence number.

But if the dataproduct is an X-ray image of the entire field where the 
objects are, it could be that the n objects are in just p << n fields, so 
there will be p images, indexed on the field identifier.

The way I do it, the URL is constructed on the fly from templates stored 
in an administrative database, replacing a placeholder with the value of 
the associated index column.

Now what is proposed ? To defer the database search to the home of the 
user which has retrieved the FITS catalog ? Using a custom FITS reader ?

> I suppose you could use dataless FITS files to describe each of the 
> large-data FITS files.

One does not need a dataless file (nor it is an optimal solution) to a
make a "portable FITS database". The right way to go is probably a 
BINTABLE where the *variable part* of the URLs is stored in some column 
(the fixed template could be stored in a custom keyword, and handled by 
the custom reader).  So a row in a BINTABLE not a multiple keyword.

Different story is if the remote user wants to see the dataproduct FITS 
header before deciding whether to retrieve the bulky data portion.

This could be handled in different ways. One is to have the header and 
data stored separately in their site, and having URLs pointing to a CGI 
which either retrieves the dataless header, or merges header and data 
before sending.

Another would be to have a custom client which can retrieve from the FITS 
file until the END keyword of the header ignoring the data part. It is not 
difficult to write e.g. a sequential FITS reader in Java (or, though 
different, a Java client which retrieves an ftp file "from record n to 
record m" ... I did it, although it naively reads unwanted records 1 to 
n-1 and simply ignores them without outputting them).

-- 
------------------------------------------------------------------------
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)