[fitsbits] UTF-8 in BINTABLE String Columns {External}

Francois-Xavier PINEAU francois-xavier.pineau at astro.unistra.fr
Thu Mar 26 09:43:07 EDT 2026


Dear fitsbits,


# Background

VOTable (v1.5) is closely compatible with the FITS Binary Table format:
https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3

In the current draft of VOTable 1.6
https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf 
,
UTF-8 strings replace the previous ASCII-only strings.

If FITS cannot store UTF-8, *lossless round-trip conversion from VOTable 
to FITS will no longer be possible*.
Some limitations already exist (e.g., unsigned integer logical types), 
but UTF-8 seems more critical.

Personal use cases include the usage of HEALPix sorted and indexed 
BINTABLES to build on-the-fly HATS products
or intermediary HiPS catalogue representations from VizieR data (will 
contains more and more UTF-8).
* HATS: https://www.ivoa.net/documents/Notes/HATS/
* HIPS catalogue: https://www.ivoa.net/documents/HiPS/
* VizieR: https://vizier.u-strasbg.fr/

# Possible Solutions

## 1. Use UTF-8 in existing `TFORMn=rA`

Like in VOTAble 1.6, interpret `r` as bytes instead of characters.
May break truncation operations (TDIPS) if a multi-byte UTF-8 character 
is split.

## 2. Logical type "UTF-8" backed by a byte array

TFORMn = rB
TLOGTn = 'UTF-8'  / LOGT stands for LOGical Type

Unaware readers see a byte array; UTF-8 aware readers interpret it as a 
string.
Introduces two string types in FITS (ASCII and UTF-8).

## 3. New TFORM type (e.g., `TFORMn=rU`)

Definite breakage for current readers.


# Existing Implementations

  * TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for 
read/write (private communication with Mark Taylor).
  * fitstable (Rust): Supports Solutions 1 and 2 for reading 
(https://github.com/cds-astro/cds-fitstable-rust).
  * VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution 1).
  * ??


# Feedback Requested

I am curious about:
  * other possible approaches
  * fitsbits opinions on the most practical solution
  * other people interested in having UTF-8 in BINTABLE columns

Currently, Solution 1 seems the simplest and Solution 2 the safest,
but I welcome constructive comments and experience from the community.

Best regards,

-- 

Francois-Xavier Pineau
Ingénieur de Recherche
Tél : +33 (0)3 68 85 24 14,
francois-xavier.pineau at astro.unistra.fr

Centre de Données astronomiques de Strasbourg (CDS)
11, rue de l'Université - E03

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20260326/04692c49/attachment.html>


More information about the fitsbits mailing list