[fitsbits] UTF-8 in BINTABLE String Columns {External}

James Tocknell james.tocknell at mq.edu.au
Sun Mar 29 19:59:50 EDT 2026


I'm not sure there's any value in supporting UTF-16 or UTF-32, https://utf8everywhere.org/ provides details as to why UTF-8 should be the standard interchange format (basically, both take up more space and encourage misconceptions about unicode). Also, for things like paths on Windows (as opposed to Unix systems where it's 8bits of some encoding), you can't rely on UTF-16 anyway (see https://wtf-8.codeberg.page/). Practically speaking, if something accepts ASCII it'll probably accept UTF-8 (and someone has already likely slipped in Latin-1 unless people are validating that the data is ASCII only), that is not true of UTF-16 or UTF-32.

James

________________________________________
From: fitsbits <fitsbits-bounces at listmgr.nrao.edu> on behalf of Barrett, Paul via fitsbits <fitsbits at listmgr.nrao.edu>
Sent: Friday, 27 March 2026 1:00 AM
To: Francois-Xavier PINEAU
Cc: fitsbits at nrao.edu
Subject: Re: [fitsbits] UTF-8 in BINTABLE String Columns {External}

Because this is somewhat of a breaking change, would it not be beneficial in the long run to extend this to UTF-16 and UTF-32?

 -- Paul


On Thu, Mar 26, 2026 at 9:44 AM Francois-Xavier PINEAU via fitsbits <fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>> wrote:

Dear fitsbits,

# Background

VOTable (v1.5) is closely compatible with the FITS Binary Table format:
https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3<https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3>

In the current draft of VOTable 1.6
https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf<https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf> ,
UTF-8 strings replace the previous ASCII-only strings.

If FITS cannot store UTF-8, lossless round-trip conversion from VOTable to FITS will no longer be possible.
Some limitations already exist (e.g., unsigned integer logical types), but UTF-8 seems more critical.

Personal use cases include the usage of HEALPix sorted and indexed BINTABLES to build on-the-fly HATS products
or intermediary HiPS catalogue representations from VizieR data (will contains more and more UTF-8).
* HATS: https://www.ivoa.net/documents/Notes/HATS/<https://www.ivoa.net/documents/Notes/HATS/>
* HIPS catalogue: https://www.ivoa.net/documents/HiPS/<https://www.ivoa.net/documents/HiPS/>
* VizieR: https://vizier.u-strasbg.fr/<https://vizier.u-strasbg.fr/>


# Possible Solutions

## 1. Use UTF-8 in existing `TFORMn=rA`

Like in VOTAble 1.6, interpret `r` as bytes instead of characters.
May break truncation operations (TDIPS) if a multi-byte UTF-8 character is split.

## 2. Logical type "UTF-8" backed by a byte array

TFORMn = rB
TLOGTn = 'UTF-8'  / LOGT stands for LOGical Type

Unaware readers see a byte array; UTF-8 aware readers interpret it as a string.
Introduces two string types in FITS (ASCII and UTF-8).

## 3. New TFORM type (e.g., `TFORMn=rU`)

Definite breakage for current readers.


# Existing Implementations

 * TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for read/write (private communication with Mark Taylor).
 * fitstable (Rust): Supports Solutions 1 and 2 for reading (https://github.com/cds-astro/cds-fitstable-rust<https://github.com/cds-astro/cds-fitstable-rust>).
 * VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution 1).
 * ??


# Feedback Requested

I am curious about:
 * other possible approaches
 * fitsbits opinions on the most practical solution
 * other people interested in having UTF-8 in BINTABLE columns

Currently, Solution 1 seems the simplest and Solution 2 the safest,
but I welcome constructive comments and experience from the community.

Best regards,

--

Francois-Xavier Pineau
Ingénieur de Recherche
Tél : +33 (0)3 68 85 24 14,
francois-xavier.pineau at astro.unistra.fr<mailto:francois-xavier.pineau at astro.unistra.fr>

Centre de Données astronomiques de Strasbourg (CDS)
11, rue de l'Université - E03



_______________________________________________
fitsbits mailing list
fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>
https://listmgr.nrao.edu/mailman/listinfo/fitsbits<https://listmgr.nrao.edu/mailman/listinfo/fitsbits>



More information about the fitsbits mailing list