<div dir="ltr"><div>Question: When UTF8 is specified, is it implicitly assumed that characters may occupy 1, 2, or 4 bytes,</div><div>or is the intent to still restrict the characters to one byte?</div><div>If the latter, I don't see too much of a problem. Old ASCII compliant files can still be read correctly;</div><div>it's just old ASCII compliant readers that can't correctly interpret the new files. But that software can be updated.</div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br><div dir="ltr" style="margin-left:0pt"><table style="border:medium none;border-collapse:collapse"><colgroup><col width="445"><col width="275"></colgroup><tbody><tr style="height:90pt"><td style="border-color:rgb(255,255,255);border-style:solid;border-width:1pt;vertical-align:top;padding:5pt"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:9.5pt;font-family:Arial;color:rgb(34,34,34);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Arnold H Rots</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Research Associate</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt">SAO/HEAD</p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Center for Astrophysics | Harvard & Smithsonian</span></p><br><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:9.5pt;font-family:Arial;color:rgb(34,34,34);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><span><span style="font-size:9.5pt;font-family:Arial;color:rgb(34,34,34);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Email: <a href="mailto:arots@cfa.harvard.edu" target="_blank">arots@cfa.harvard.edu</a></span></span></span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:9.5pt;font-family:Arial;color:rgb(34,34,34);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Office: +1 617 496 7701 | Cell: +1 617 721 6756</span></p><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:9.5pt;font-family:Arial;color:rgb(34,34,34);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">60 Garden Street | MS 69 | Cambridge, MA 02138 | USA</span></p><br><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:9.5pt;font-family:Arial;color:rgb(34,34,34);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><img src="https://lh6.googleusercontent.com/R45qZsZUYbnrNung3ANEy_aJBrli78_IZ48LOZIdYaUicFhNhvpo-9n2bBgUfY80wp7S__msk71xAU9J4jXAamyFuvvOfb2sCp8SmlPdRA4WGpw1Yfcbvjr2qCw0r4cLyVaFQ3NB" style="border:medium none" width="350" height="2"></span></p></td></tr><tr style="height:27pt"><td style="border-color:rgb(255,255,255);border-style:solid;border-width:1pt;vertical-align:top;padding:5pt"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><img src="https://lh6.googleusercontent.com/Ibk6DE2j3nS5F7iU5K7tvYlZhOBC_IK9gXntzK57EkBWwJtegzT-mSi6UgJH-wearGMcUE-4R1pOAdNtrzexzdvzwQGlfSuLItPiXhxUUNrenego55AGWwAX1W24xpcvHu1an55e" style="border:medium none" width="149" height="65"></span></p><br><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt"><a href="http://cfa.harvard.edu/" style="text-decoration:none" target="_blank"><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">cfa.harvard.edu</span></a><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> | </span><a href="http://cfa.harvard.edu/facebook" style="text-decoration:none" target="_blank"><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">Facebook</span></a><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> | </span><a href="http://cfa.harvard.edu/twitter" style="text-decoration:none" target="_blank"><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">Twitter</span></a><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> | </span><a href="http://cfa.harvard.edu/youtube" style="text-decoration:none" target="_blank"><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">YouTube</span></a><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> | </span><a href="http://cfa.harvard.edu/newsletter" style="text-decoration:none" target="_blank"><span style="font-size:9.5pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">Newsletter</span></a></p></td></tr></tbody></table></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Sun, Mar 29, 2026 at 8:00 PM James Tocknell via fitsbits <<a href="mailto:fitsbits@listmgr.nrao.edu">fitsbits@listmgr.nrao.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I'm not sure there's any value in supporting UTF-16 or UTF-32, <a href="https://utf8everywhere.org/" rel="noreferrer" target="_blank">https://utf8everywhere.org/</a> provides details as to why UTF-8 should be the standard interchange format (basically, both take up more space and encourage misconceptions about unicode). Also, for things like paths on Windows (as opposed to Unix systems where it's 8bits of some encoding), you can't rely on UTF-16 anyway (see <a href="https://wtf-8.codeberg.page/" rel="noreferrer" target="_blank">https://wtf-8.codeberg.page/</a>). Practically speaking, if something accepts ASCII it'll probably accept UTF-8 (and someone has already likely slipped in Latin-1 unless people are validating that the data is ASCII only), that is not true of UTF-16 or UTF-32.<br>
<br>
James<br>
<br>
________________________________________<br>
From: fitsbits <<a href="mailto:fitsbits-bounces@listmgr.nrao.edu" target="_blank">fitsbits-bounces@listmgr.nrao.edu</a>> on behalf of Barrett, Paul via fitsbits <<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a>><br>
Sent: Friday, 27 March 2026 1:00 AM<br>
To: Francois-Xavier PINEAU<br>
Cc: <a href="mailto:fitsbits@nrao.edu" target="_blank">fitsbits@nrao.edu</a><br>
Subject: Re: [fitsbits] UTF-8 in BINTABLE String Columns {External}<br>
<br>
Because this is somewhat of a breaking change, would it not be beneficial in the long run to extend this to UTF-16 and UTF-32?<br>
<br>
 -- Paul<br>
<br>
<br>
On Thu, Mar 26, 2026 at 9:44 AM Francois-Xavier PINEAU via fitsbits <<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a><mailto:<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a>>> wrote:<br>
<br>
Dear fitsbits,<br>
<br>
# Background<br>
<br>
VOTable (v1.5) is closely compatible with the FITS Binary Table format:<br>
<a href="https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3" rel="noreferrer" target="_blank">https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3</a><<a href="https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3" rel="noreferrer" target="_blank">https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3</a>><br>
<br>
In the current draft of VOTable 1.6<br>
<a href="https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf" rel="noreferrer" target="_blank">https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf</a><<a href="https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf" rel="noreferrer" target="_blank">https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf</a>> ,<br>
UTF-8 strings replace the previous ASCII-only strings.<br>
<br>
If FITS cannot store UTF-8, lossless round-trip conversion from VOTable to FITS will no longer be possible.<br>
Some limitations already exist (e.g., unsigned integer logical types), but UTF-8 seems more critical.<br>
<br>
Personal use cases include the usage of HEALPix sorted and indexed BINTABLES to build on-the-fly HATS products<br>
or intermediary HiPS catalogue representations from VizieR data (will contains more and more UTF-8).<br>
* HATS: <a href="https://www.ivoa.net/documents/Notes/HATS/" rel="noreferrer" target="_blank">https://www.ivoa.net/documents/Notes/HATS/</a><<a href="https://www.ivoa.net/documents/Notes/HATS/" rel="noreferrer" target="_blank">https://www.ivoa.net/documents/Notes/HATS/</a>><br>
* HIPS catalogue: <a href="https://www.ivoa.net/documents/HiPS/" rel="noreferrer" target="_blank">https://www.ivoa.net/documents/HiPS/</a><<a href="https://www.ivoa.net/documents/HiPS/" rel="noreferrer" target="_blank">https://www.ivoa.net/documents/HiPS/</a>><br>
* VizieR: <a href="https://vizier.u-strasbg.fr/" rel="noreferrer" target="_blank">https://vizier.u-strasbg.fr/</a><<a href="https://vizier.u-strasbg.fr/" rel="noreferrer" target="_blank">https://vizier.u-strasbg.fr/</a>><br>
<br>
<br>
# Possible Solutions<br>
<br>
## 1. Use UTF-8 in existing `TFORMn=rA`<br>
<br>
Like in VOTAble 1.6, interpret `r` as bytes instead of characters.<br>
May break truncation operations (TDIPS) if a multi-byte UTF-8 character is split.<br>
<br>
## 2. Logical type "UTF-8" backed by a byte array<br>
<br>
TFORMn = rB<br>
TLOGTn = 'UTF-8'  / LOGT stands for LOGical Type<br>
<br>
Unaware readers see a byte array; UTF-8 aware readers interpret it as a string.<br>
Introduces two string types in FITS (ASCII and UTF-8).<br>
<br>
## 3. New TFORM type (e.g., `TFORMn=rU`)<br>
<br>
Definite breakage for current readers.<br>
<br>
<br>
# Existing Implementations<br>
<br>
 * TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for read/write (private communication with Mark Taylor).<br>
 * fitstable (Rust): Supports Solutions 1 and 2 for reading (<a href="https://github.com/cds-astro/cds-fitstable-rust" rel="noreferrer" target="_blank">https://github.com/cds-astro/cds-fitstable-rust</a><<a href="https://github.com/cds-astro/cds-fitstable-rust" rel="noreferrer" target="_blank">https://github.com/cds-astro/cds-fitstable-rust</a>>).<br>
 * VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution 1).<br>
 * ??<br>
<br>
<br>
# Feedback Requested<br>
<br>
I am curious about:<br>
 * other possible approaches<br>
 * fitsbits opinions on the most practical solution<br>
 * other people interested in having UTF-8 in BINTABLE columns<br>
<br>
Currently, Solution 1 seems the simplest and Solution 2 the safest,<br>
but I welcome constructive comments and experience from the community.<br>
<br>
Best regards,<br>
<br>
--<br>
<br>
Francois-Xavier Pineau<br>
Ingénieur de Recherche<br>
Tél : +33 (0)3 68 85 24 14,<br>
<a href="mailto:francois-xavier.pineau@astro.unistra.fr" target="_blank">francois-xavier.pineau@astro.unistra.fr</a><mailto:<a href="mailto:francois-xavier.pineau@astro.unistra.fr" target="_blank">francois-xavier.pineau@astro.unistra.fr</a>><br>
<br>
Centre de Données astronomiques de Strasbourg (CDS)<br>
11, rue de l'Université - E03<br>
<br>
<br>
<br>
_______________________________________________<br>
fitsbits mailing list<br>
<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a><mailto:<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a>><br>
<a href="https://listmgr.nrao.edu/mailman/listinfo/fitsbits" rel="noreferrer" target="_blank">https://listmgr.nrao.edu/mailman/listinfo/fitsbits</a><<a href="https://listmgr.nrao.edu/mailman/listinfo/fitsbits" rel="noreferrer" target="_blank">https://listmgr.nrao.edu/mailman/listinfo/fitsbits</a>><br>
<br>
_______________________________________________<br>
fitsbits mailing list<br>
<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a><br>
<a href="https://listmgr.nrao.edu/mailman/listinfo/fitsbits" rel="noreferrer" target="_blank">https://listmgr.nrao.edu/mailman/listinfo/fitsbits</a><br>
</blockquote></div>