[fitsbits] UTF-8 in BINTABLE String Columns {External}
Francois-Xavier PINEAU
francois-xavier.pineau at astro.unistra.fr
Thu Mar 26 09:43:07 EDT 2026
Dear fitsbits,
# Background
VOTable (v1.5) is closely compatible with the FITS Binary Table format:
https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3
In the current draft of VOTable 1.6
https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf
,
UTF-8 strings replace the previous ASCII-only strings.
If FITS cannot store UTF-8, *lossless round-trip conversion from VOTable
to FITS will no longer be possible*.
Some limitations already exist (e.g., unsigned integer logical types),
but UTF-8 seems more critical.
Personal use cases include the usage of HEALPix sorted and indexed
BINTABLES to build on-the-fly HATS products
or intermediary HiPS catalogue representations from VizieR data (will
contains more and more UTF-8).
* HATS: https://www.ivoa.net/documents/Notes/HATS/
* HIPS catalogue: https://www.ivoa.net/documents/HiPS/
* VizieR: https://vizier.u-strasbg.fr/
# Possible Solutions
## 1. Use UTF-8 in existing `TFORMn=rA`
Like in VOTAble 1.6, interpret `r` as bytes instead of characters.
May break truncation operations (TDIPS) if a multi-byte UTF-8 character
is split.
## 2. Logical type "UTF-8" backed by a byte array
TFORMn = rB
TLOGTn = 'UTF-8' / LOGT stands for LOGical Type
Unaware readers see a byte array; UTF-8 aware readers interpret it as a
string.
Introduces two string types in FITS (ASCII and UTF-8).
## 3. New TFORM type (e.g., `TFORMn=rU`)
Definite breakage for current readers.
# Existing Implementations
* TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for
read/write (private communication with Mark Taylor).
* fitstable (Rust): Supports Solutions 1 and 2 for reading
(https://github.com/cds-astro/cds-fitstable-rust).
* VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution 1).
* ??
# Feedback Requested
I am curious about:
* other possible approaches
* fitsbits opinions on the most practical solution
* other people interested in having UTF-8 in BINTABLE columns
Currently, Solution 1 seems the simplest and Solution 2 the safest,
but I welcome constructive comments and experience from the community.
Best regards,
--
Francois-Xavier Pineau
Ingénieur de Recherche
Tél : +33 (0)3 68 85 24 14,
francois-xavier.pineau at astro.unistra.fr
Centre de Données astronomiques de Strasbourg (CDS)
11, rue de l'Université - E03
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20260326/04692c49/attachment.html>
More information about the fitsbits
mailing list