[fitsbits] UTF-8 in BINTABLE String Columns {External}

Arnold Rots arots at cfa.harvard.edu
Mon Apr 6 10:12:43 EDT 2026


Agreed!


Arnold H Rots

Research Associate

SAO/HEAD

Center for Astrophysics | Harvard & Smithsonian

Email: arots at cfa.harvard.edu

Office: +1 617 496 7701 | Cell: +1 617 721 6756

60 Garden Street | MS 69 | Cambridge, MA 02138 | USA


cfa.harvard.edu | Facebook <http://cfa.harvard.edu/facebook> | Twitter
<http://cfa.harvard.edu/twitter> | YouTube <http://cfa.harvard.edu/youtube>
| Newsletter <http://cfa.harvard.edu/newsletter>

On Mon, Apr 6, 2026, 05:54 Mark Taylor via fitsbits <
fitsbits at listmgr.nrao.edu> wrote:

> Thank you Bill.
>
> To confirm, solution 1 proposed by FX, changing the interpretation
> of TFORM=A to be UTF-8 rather than ASCII, would be fully consistent
> with this "once FITS always FITS" rule.  Character content as
> currently defined would remain valid FITS, with the same interpretation
> as now, since ASCII encoding is identical to UTF-8 encoding for
> any currently legal FITS character (characters 0x20 to 0x7E).
>
> Mark
>
> On Sat, 4 Apr 2026, William Pence via fitsbits wrote:
>
> > Roughly translated, Francois wrote:  “it would have seemed important to
> keep the distinction between pure ascii and utf-8, because of the FITS
> logic to ensure that changes in the standard should not affect existing
> applications would be violated”.
> >
> > Just to be clear, there has never been a requirement that changes to the
> definition of the FITS format must not cause problems for existing software
> applications.  The actual requirement, as stated in section 3.7 of the FITS
> Standard is that “Any structure that is a valid FITS structure shall remain
> a valid FITS structure at all future times” (often referred to as the “once
> FITS, always FITS” rule).
> >
> > Bill
> >
> >
> > > On Apr 3, 2026, at 11:16 PM, Francois Ochsenbein via fitsbits <
> fitsbits at listmgr.nrao.edu> wrote:
> > >
> > > 
> > > Bonjour FX,
> > >
> > > Apparemment le « current draft of VOTable 1.6 » n'existe pas ?
> > > Personnellement il m'aurait semblé important de conserver la
> > > distinction entre pur ascii et utf-8, car la logique de FITS de
> > > garantir que les changements dans le standard ne doivent pas affecter
> > > les applications existantes serait violée… Mais je serais curieux de
> > > savoir en quoi le HATS a besoin d'UTF-8 ?
> > >
> > > En te souhaitant une belle fin de semaine, et peut-être à bientôt ?
> > >
> > > Bien amicalement, François
> > >
> > > ==> Le jeudi 2026-03-26 à 14:43+0100,
> > >    Francois-Xavier PINEAU via fitsbits <fitsbits at listmgr.nrao.edu> a
> > > écrit:
> > >
> > >> Dear fitsbits,
> > >>
> > >>
> > >> # Background
> > >>
> > >> VOTable (v1.5) is closely compatible with the FITS Binary Table
> format:
> > >>
> https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3
> > >>
> > >> In the current draft of VOTable 1.6
> > >>
> https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf
> > >> ,
> > >> UTF-8 strings replace the previous ASCII-only strings.
> > >>
> > >> If FITS cannot store UTF-8, *lossless round-trip conversion from
> > >> VOTable to FITS will no longer be possible*.
> > >> Some limitations already exist (e.g., unsigned integer logical types),
> > >> but UTF-8 seems more critical.
> > >>
> > >> Personal use cases include the usage of HEALPix sorted and indexed
> > >> BINTABLES to build on-the-fly HATS products
> > >> or intermediary HiPS catalogue representations from VizieR data (will
> > >> contains more and more UTF-8).
> > >> * HATS: https://www.ivoa.net/documents/Notes/HATS/
> > >> * HIPS catalogue: https://www.ivoa.net/documents/HiPS/
> > >> * VizieR: https://vizier.u-strasbg.fr/
> > >>
> > >> # Possible Solutions
> > >>
> > >> ## 1. Use UTF-8 in existing `TFORMn=rA`
> > >>
> > >> Like in VOTAble 1.6, interpret `r` as bytes instead of characters.
> > >> May break truncation operations (TDIPS) if a multi-byte UTF-8
> > >> character is split.
> > >>
> > >> ## 2. Logical type "UTF-8" backed by a byte array
> > >>
> > >> TFORMn = rB
> > >> TLOGTn = 'UTF-8'  / LOGT stands for LOGical Type
> > >>
> > >> Unaware readers see a byte array; UTF-8 aware readers interpret it as
> > >> a string.
> > >> Introduces two string types in FITS (ASCII and UTF-8).
> > >>
> > >> ## 3. New TFORM type (e.g., `TFORMn=rU`)
> > >>
> > >> Definite breakage for current readers.
> > >>
> > >>
> > >> # Existing Implementations
> > >>
> > >>  * TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for
> > >> read/write (private communication with Mark Taylor).
> > >>  * fitstable (Rust): Supports Solutions 1 and 2 for reading
> > >> (https://github.com/cds-astro/cds-fitstable-rust).
> > >>  * VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution 1).
> > >>  * ??
> > >>
> > >>
> > >> # Feedback Requested
> > >>
> > >> I am curious about:
> > >>  * other possible approaches
> > >>  * fitsbits opinions on the most practical solution
> > >>  * other people interested in having UTF-8 in BINTABLE columns
> > >>
> > >> Currently, Solution 1 seems the simplest and Solution 2 the safest,
> > >> but I welcome constructive comments and experience from the community.
> > >>
> > >> Best regards,
> > >>
> > >
> > >
> > > --
> > > ======================================================================
> > > Francois Ochsenbein at free.fr ---   67380 Lingolsheim
> > > ======================================================================
> > >
> > > _______________________________________________
> > > fitsbits mailing list
> > > fitsbits at listmgr.nrao.edu
> > > https://listmgr.nrao.edu/mailman/listinfo/fitsbits
> >
> >
> > _______________________________________________
> > fitsbits mailing list
> > fitsbits at listmgr.nrao.edu
> > https://listmgr.nrao.edu/mailman/listinfo/fitsbits
> >
>
> --
> Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
> m.b.taylor at bristol.ac.uk          https://www.star.bristol.ac.uk/mbt/
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20260406/ce41ab5e/attachment-0001.html>


More information about the fitsbits mailing list