[fitsbits] REFERENC keyword, etc.

Joe Hourcle oneiros at grace.nascom.nasa.gov
Thu Jul 23 14:17:04 EDT 2015



On Wed, 22 Jul 2015, van Nieuwenhoven, Richard wrote:

> in other cases (in the java world ;-) ) we used URI protocols for such
> cases:
>
> http://dx.doi.org/10.xxx/xxxxxx -> doi://xxx/xxxxxx
> http://adsabs.harvard.edu/abs/xxxxx -> adsabs://xxxxx
>
> and then write in the fits standard how to resolve the uri to an url. In
> most programming languages this can be done in a library without the
> user's knowing the difference. The user can just open the connection to
> doi://xxx/xxxxxx in his usual way.
>
> This way the reference can be adapted easily if the urls change, without
> the fits files becoming invalid.
>

Like I said in my earlier message, I used to think that way, but the 
problem is that you're favoring long-term stability over ease of use in 
the short term.  CrossRef has an explanation of their revised 
recommendation at:

 	http://www.crossref.org/01company/pr/news080211.html


A few problems with your proposed implementation:

1. If we declare a file to conform to a specific version of the standard,
    readers should use that version, not some future version.  (and the
    most recent version would be necessary to know where the current
    resolver is)

2. To make the file understandable, we would (a) need some method to
    either link back to documentation for that particular data, (b) come up
    with some central registry for namespaces used in FITS, or (c) agree to
    only use namespaces in IANA or some other namespace registry.*

    (a) is a cyclic problem, as we'd have to know how to link to that
    data's documentation.  (c) means that we'd be reliant on an external
    body to maintain a list, and would need to register something for
    bibcodes.  (b) means that this group needs to maintain a registry.

3. If there are existing URNs in use in documents, we risk namespace
    collisions if two groups used the same namespace for different types of
    identifiers.**

> Of course manual readers, will have to lookup the ref in the documentation.

Even for non-manual readers.

I mean, personally, if I had assigned URNs to bibcodes, I would've used 
'bibcode:'.  ADS is a service that has a few different interfaces, and I 
wouldn't be surprised if they have more than one catalog.  (eg, they do 
lookup by name ... so they might also assign identifiers to authors in the 
future to help with disambiguation).

You might know what 'doi' is, but if we allow each data producer to define 
their own URNs, someone could define 'doi' with a different resolver.  For 
instance, it could actually be the 'short DOI' resolver: ***

 	http://doi.org/...

...


Some other alternatives would be:

 	1. Use ARKs (Archive Resource Keys):
 		https://wiki.ucop.edu/display/Curation/ARK

 	   An ARK-aware client would handle the remapping of resolvers

 	2. Include both a URN and URL for every reference.  The URL would
            be known to work at the time of file preparation, while the URN
            would serve as a longer-term identifier should the resolver go
            down.


-Joe


* http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
   DOI is registered, bibcodes are not.  And they don't have any
   information about what to actually do with a given URI namespace.

** This was a huge PITA with RPC/Encoded SOAP.  Just because you saw
    'soap:' or 'xsi:', you didn't know what specific version of the
    standard it was referencing.  I don't remember what it was called, but
    for compatibility, you often had to expand identifiers within the XML
    document as "{namespace-URL}:identifier".

*** Which I hope no one would ever use in a publication, as it's compacted
     such that you can't identify the naming authority from the string.


-----
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center



More information about the fitsbits mailing list