[fitsbits] UTF-8 in BINTABLE String Columns {External}
Francois-Xavier PINEAU
francois-xavier.pineau at astro.unistra.fr
Tue Apr 7 10:45:38 EDT 2026
Dear Bill,
In the following document:
https://fits.gsfc.nasa.gov/users_guide/users_guide/node7.html,
it is written:
"Changes in the FITS rules may add new structures that old software cannot
handle. Revised software will be required for new standard extensions, but
revising a software package is a far smaller effort than updating a full
data library would be.
As far as is possible, however, FITS should be expanded in such a way
that the old software will still be able to process those parts of the file
which it is capable of handling. In such a case, software should not fail
or give incorrect results when confronted with the new extension or
conventions; it should simply ignore them and continue to process
those parts of the file that it can understand."
The text "as far as" is open to interpretation, and it seems from you
message
that François may be interpreting it somewhat strictly.
I would therefore like to join Mark Taylor in thanking you for the
clarification,
and I agree (also with Arnold) that solution 1 (adopted in VOTable) is
"fully consistent with this 'once FITS always FITS' rule".
Adopting it may require software to be updated
(or at least tested with UTF-8 data): I don't known what would be
the behavior of current software encountering non-ASCII UTF-8 codepoints
in a "TFORM = xA" column...
Dear Rob,
As confirmed by Mark Taylor, the proposal concerns UTF-8 only in
BINTABLE (or ASCII) columns, not in the header.
For TFORM= 'rA', TFORM='PA(emax)', and TFORM='QA(emax)',
the values of 'r', 'emax' and the stored "number of elements"
would need to be interpreted as a number of bytes
(which is the number of ASCII characters in an ASCII string,
which is an ASCII-only UTF-8 string).
Le 04/04/2026 à 20:19, William Pence via fitsbits a écrit :
> Roughly translated, Francois wrote: “it would have seemed important to keep the distinction between pure ascii and utf-8, because of the FITS logic to ensure that changes in the standard should not affect existing applications would be violated”.
>
> Just to be clear, there has never been a requirement that changes to the definition of the FITS format must not cause problems for existing software applications. The actual requirement, as stated in section 3.7 of the FITS Standard is that “Any structure that is a valid FITS structure shall remain a valid FITS structure at all future times” (often referred to as the “once FITS, always FITS” rule).
>
> Bill
>
>
>> On Apr 3, 2026, at 11:16 PM, Francois Ochsenbein via fitsbits<fitsbits at listmgr.nrao.edu> wrote:
>>
>>
>> Bonjour FX,
>>
>> Apparemment le « current draft of VOTable 1.6 » n'existe pas ?
>> Personnellement il m'aurait semblé important de conserver la
>> distinction entre pur ascii et utf-8, car la logique de FITS de
>> garantir que les changements dans le standard ne doivent pas affecter
>> les applications existantes serait violée… Mais je serais curieux de
>> savoir en quoi le HATS a besoin d'UTF-8 ?
>>
>> En te souhaitant une belle fin de semaine, et peut-être à bientôt ?
>>
>> Bien amicalement, François
>>
>> ==> Le jeudi 2026-03-26 à 14:43+0100,
>> Francois-Xavier PINEAU via fitsbits<fitsbits at listmgr.nrao.edu> a
>> écrit:
>>
>>> Dear fitsbits,
>>>
>>>
>>> # Background
>>>
>>> VOTable (v1.5) is closely compatible with the FITS Binary Table format:
>>> https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3
>>>
>>> In the current draft of VOTable 1.6
>>> https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf
>>> ,
>>> UTF-8 strings replace the previous ASCII-only strings.
>>>
>>> If FITS cannot store UTF-8, *lossless round-trip conversion from
>>> VOTable to FITS will no longer be possible*.
>>> Some limitations already exist (e.g., unsigned integer logical types),
>>> but UTF-8 seems more critical.
>>>
>>> Personal use cases include the usage of HEALPix sorted and indexed
>>> BINTABLES to build on-the-fly HATS products
>>> or intermediary HiPS catalogue representations from VizieR data (will
>>> contains more and more UTF-8).
>>> * HATS:https://www.ivoa.net/documents/Notes/HATS/
>>> * HIPS catalogue:https://www.ivoa.net/documents/HiPS/
>>> * VizieR:https://vizier.u-strasbg.fr/
>>>
>>> # Possible Solutions
>>>
>>> ## 1. Use UTF-8 in existing `TFORMn=rA`
>>>
>>> Like in VOTAble 1.6, interpret `r` as bytes instead of characters.
>>> May break truncation operations (TDIPS) if a multi-byte UTF-8
>>> character is split.
>>>
>>> ## 2. Logical type "UTF-8" backed by a byte array
>>>
>>> TFORMn = rB
>>> TLOGTn = 'UTF-8' / LOGT stands for LOGical Type
>>>
>>> Unaware readers see a byte array; UTF-8 aware readers interpret it as
>>> a string.
>>> Introduces two string types in FITS (ASCII and UTF-8).
>>>
>>> ## 3. New TFORM type (e.g., `TFORMn=rU`)
>>>
>>> Definite breakage for current readers.
>>>
>>>
>>> # Existing Implementations
>>>
>>> * TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for
>>> read/write (private communication with Mark Taylor).
>>> * fitstable (Rust): Supports Solutions 1 and 2 for reading
>>> (https://github.com/cds-astro/cds-fitstable-rust).
>>> * VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution 1).
>>> * ??
>>>
>>>
>>> # Feedback Requested
>>>
>>> I am curious about:
>>> * other possible approaches
>>> * fitsbits opinions on the most practical solution
>>> * other people interested in having UTF-8 in BINTABLE columns
>>>
>>> Currently, Solution 1 seems the simplest and Solution 2 the safest,
>>> but I welcome constructive comments and experience from the community.
>>>
>>> Best regards,
>>>
>>
>> --
>> ======================================================================
>> FrancoisOchsenbein at free.fr --- 67380 Lingolsheim
>> ======================================================================
>>
>> _______________________________________________
>> fitsbits mailing list
>> fitsbits at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>
>
>
>
--
Francois-Xavier Pineau
Ingénieur de Recherche
Tél : +33 (0)3 68 85 24 14,
francois-xavier.pineau at astro.unistra.fr
Centre de Données astronomiques de Strasbourg (CDS)
11, rue de l'Université - E03
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20260407/489bff0a/attachment-0001.html>
More information about the fitsbits
mailing list