[fitsbits] UTF-8 in BINTABLE String Columns {External}

Francois Ochsenbein Francois.Ochsenbein at free.fr
Fri Apr 3 06:50:13 EDT 2026


Bonjour FX,

Apparemment le « current draft of VOTable 1.6 » n'existe pas ?
Personnellement il m'aurait semblé important de conserver la
distinction entre pur ascii et utf-8, car la logique de FITS de
garantir que les changements dans le standard ne doivent pas affecter
les applications existantes serait violée… Mais je serais curieux de
savoir en quoi le HATS a besoin d'UTF-8 ?

En te souhaitant une belle fin de semaine, et peut-être à bientôt ?

Bien amicalement, François

==> Le jeudi 2026-03-26 à 14:43+0100,
    Francois-Xavier PINEAU via fitsbits <fitsbits at listmgr.nrao.edu> a
écrit:

>Dear fitsbits,
>
>
># Background
>
>VOTable (v1.5) is closely compatible with the FITS Binary Table format:
>https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3
>
>In the current draft of VOTable 1.6
>https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf 
>,
>UTF-8 strings replace the previous ASCII-only strings.
>
>If FITS cannot store UTF-8, *lossless round-trip conversion from
>VOTable to FITS will no longer be possible*.
>Some limitations already exist (e.g., unsigned integer logical types), 
>but UTF-8 seems more critical.
>
>Personal use cases include the usage of HEALPix sorted and indexed 
>BINTABLES to build on-the-fly HATS products
>or intermediary HiPS catalogue representations from VizieR data (will 
>contains more and more UTF-8).
>* HATS: https://www.ivoa.net/documents/Notes/HATS/
>* HIPS catalogue: https://www.ivoa.net/documents/HiPS/
>* VizieR: https://vizier.u-strasbg.fr/
>
># Possible Solutions
>
>## 1. Use UTF-8 in existing `TFORMn=rA`
>
>Like in VOTAble 1.6, interpret `r` as bytes instead of characters.
>May break truncation operations (TDIPS) if a multi-byte UTF-8
>character is split.
>
>## 2. Logical type "UTF-8" backed by a byte array
>
>TFORMn = rB
>TLOGTn = 'UTF-8'  / LOGT stands for LOGical Type
>
>Unaware readers see a byte array; UTF-8 aware readers interpret it as
>a string.
>Introduces two string types in FITS (ASCII and UTF-8).
>
>## 3. New TFORM type (e.g., `TFORMn=rU`)
>
>Definite breakage for current readers.
>
>
># Existing Implementations
>
>  * TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for 
>read/write (private communication with Mark Taylor).
>  * fitstable (Rust): Supports Solutions 1 and 2 for reading 
>(https://github.com/cds-astro/cds-fitstable-rust).
>  * VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution 1).
>  * ??
>
>
># Feedback Requested
>
>I am curious about:
>  * other possible approaches
>  * fitsbits opinions on the most practical solution
>  * other people interested in having UTF-8 in BINTABLE columns
>
>Currently, Solution 1 seems the simplest and Solution 2 the safest,
>but I welcome constructive comments and experience from the community.
>
>Best regards,
>


-- 
======================================================================
Francois Ochsenbein at free.fr ---   67380 Lingolsheim 
======================================================================



More information about the fitsbits mailing list