[fitsbits] UTF-8 in BINTABLE String Columns {External}

Mark Taylor m.b.taylor at bristol.ac.uk
Mon Apr 6 05:53:47 EDT 2026


Thank you Bill.

To confirm, solution 1 proposed by FX, changing the interpretation
of TFORM=A to be UTF-8 rather than ASCII, would be fully consistent
with this "once FITS always FITS" rule.  Character content as
currently defined would remain valid FITS, with the same interpretation 
as now, since ASCII encoding is identical to UTF-8 encoding for
any currently legal FITS character (characters 0x20 to 0x7E).

Mark

On Sat, 4 Apr 2026, William Pence via fitsbits wrote:

> Roughly translated, Francois wrote:  “it would have seemed important to keep the distinction between pure ascii and utf-8, because of the FITS logic to ensure that changes in the standard should not affect existing applications would be violated”.
> 
> Just to be clear, there has never been a requirement that changes to the definition of the FITS format must not cause problems for existing software applications.  The actual requirement, as stated in section 3.7 of the FITS Standard is that “Any structure that is a valid FITS structure shall remain a valid FITS structure at all future times” (often referred to as the “once FITS, always FITS” rule). 
> 
> Bill
> 
> 
> > On Apr 3, 2026, at 11:16 PM, Francois Ochsenbein via fitsbits <fitsbits at listmgr.nrao.edu> wrote:
> > 
> > 
> > Bonjour FX,
> > 
> > Apparemment le « current draft of VOTable 1.6 » n'existe pas ?
> > Personnellement il m'aurait semblé important de conserver la
> > distinction entre pur ascii et utf-8, car la logique de FITS de
> > garantir que les changements dans le standard ne doivent pas affecter
> > les applications existantes serait violée… Mais je serais curieux de
> > savoir en quoi le HATS a besoin d'UTF-8 ?
> > 
> > En te souhaitant une belle fin de semaine, et peut-être à bientôt ?
> > 
> > Bien amicalement, François
> > 
> > ==> Le jeudi 2026-03-26 à 14:43+0100,
> >    Francois-Xavier PINEAU via fitsbits <fitsbits at listmgr.nrao.edu> a
> > écrit:
> > 
> >> Dear fitsbits,
> >> 
> >> 
> >> # Background
> >> 
> >> VOTable (v1.5) is closely compatible with the FITS Binary Table format:
> >> https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3
> >> 
> >> In the current draft of VOTable 1.6
> >> https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf
> >> ,
> >> UTF-8 strings replace the previous ASCII-only strings.
> >> 
> >> If FITS cannot store UTF-8, *lossless round-trip conversion from
> >> VOTable to FITS will no longer be possible*.
> >> Some limitations already exist (e.g., unsigned integer logical types),
> >> but UTF-8 seems more critical.
> >> 
> >> Personal use cases include the usage of HEALPix sorted and indexed
> >> BINTABLES to build on-the-fly HATS products
> >> or intermediary HiPS catalogue representations from VizieR data (will
> >> contains more and more UTF-8).
> >> * HATS: https://www.ivoa.net/documents/Notes/HATS/
> >> * HIPS catalogue: https://www.ivoa.net/documents/HiPS/
> >> * VizieR: https://vizier.u-strasbg.fr/
> >> 
> >> # Possible Solutions
> >> 
> >> ## 1. Use UTF-8 in existing `TFORMn=rA`
> >> 
> >> Like in VOTAble 1.6, interpret `r` as bytes instead of characters.
> >> May break truncation operations (TDIPS) if a multi-byte UTF-8
> >> character is split.
> >> 
> >> ## 2. Logical type "UTF-8" backed by a byte array
> >> 
> >> TFORMn = rB
> >> TLOGTn = 'UTF-8'  / LOGT stands for LOGical Type
> >> 
> >> Unaware readers see a byte array; UTF-8 aware readers interpret it as
> >> a string.
> >> Introduces two string types in FITS (ASCII and UTF-8).
> >> 
> >> ## 3. New TFORM type (e.g., `TFORMn=rU`)
> >> 
> >> Definite breakage for current readers.
> >> 
> >> 
> >> # Existing Implementations
> >> 
> >>  * TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for
> >> read/write (private communication with Mark Taylor).
> >>  * fitstable (Rust): Supports Solutions 1 and 2 for reading
> >> (https://github.com/cds-astro/cds-fitstable-rust).
> >>  * VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution 1).
> >>  * ??
> >> 
> >> 
> >> # Feedback Requested
> >> 
> >> I am curious about:
> >>  * other possible approaches
> >>  * fitsbits opinions on the most practical solution
> >>  * other people interested in having UTF-8 in BINTABLE columns
> >> 
> >> Currently, Solution 1 seems the simplest and Solution 2 the safest,
> >> but I welcome constructive comments and experience from the community.
> >> 
> >> Best regards,
> >> 
> > 
> > 
> > --
> > ======================================================================
> > Francois Ochsenbein at free.fr ---   67380 Lingolsheim
> > ======================================================================
> > 
> > _______________________________________________
> > fitsbits mailing list
> > fitsbits at listmgr.nrao.edu
> > https://listmgr.nrao.edu/mailman/listinfo/fitsbits
> 
> 
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
> 

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          https://www.star.bristol.ac.uk/mbt/


More information about the fitsbits mailing list