<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Dear fitsbits,</p>
<p><br>
# Background</p>
<p>VOTable (v1.5) is closely compatible with the FITS Binary Table
format:<br>
<a class="moz-txt-link-freetext" href="https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3">https://www.ivoa.net/documents/VOTable/20250116/REC-VOTable-1.5.html#tth_sEc2.3</a></p>
<p>In the current draft of VOTable 1.6<br>
<a class="moz-txt-link-freetext" href="https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf">https://github.com/ivoa-std/VOTable/releases/download/auto-pdf-preview/VOTable-draft.pdf</a>
,<br>
UTF-8 strings replace the previous ASCII-only strings.<br>
</p>
<p>If FITS cannot store UTF-8, <strong data-start="754"
data-end="836">lossless round-trip conversion from VOTable to
FITS will no longer be possible</strong>.<br>
Some limitations already exist (e.g., unsigned integer logical
types), but UTF-8 seems more critical.</p>
<p>Personal use cases include the usage of HEALPix sorted and
indexed BINTABLES to build on-the-fly HATS products<br>
or intermediary HiPS catalogue representations from VizieR data
(will contains more and more UTF-8).<br>
* HATS: <a class="moz-txt-link-freetext" href="https://www.ivoa.net/documents/Notes/HATS/">https://www.ivoa.net/documents/Notes/HATS/</a><br>
* HIPS catalogue: <a class="moz-txt-link-freetext" href="https://www.ivoa.net/documents/HiPS/">https://www.ivoa.net/documents/HiPS/</a><br>
* VizieR: <a class="moz-txt-link-freetext" href="https://vizier.u-strasbg.fr/">https://vizier.u-strasbg.fr/</a><br>
<br>
</p>
<p># Possible Solutions</p>
<p>## 1. Use UTF-8 in existing `TFORMn=rA`</p>
<p>Like in VOTAble 1.6, interpret `r` as bytes instead of
characters.<br>
May break truncation operations (TDIPS) if a multi-byte UTF-8
character is split.</p>
<p>## 2. Logical type "UTF-8" backed by a byte array</p>
<p>TFORMn = rB <br>
TLOGTn = 'UTF-8' / LOGT stands for LOGical Type<br>
<br>
Unaware readers see a byte array; UTF-8 aware readers interpret it
as a string.<br>
Introduces two string types in FITS (ASCII and UTF-8).</p>
<p>## 3. New TFORM type (e.g., `TFORMn=rU`)</p>
<p>Definite breakage for current readers.</p>
<p><br>
</p>
<p># Existing Implementations</p>
<p> * TOPCAT/STILTS (Java): Prototype supports Solutions 1 and 2 for
read/write (private communication with Mark Taylor).<br>
* fitstable (Rust): Supports Solutions 1 and 2 for reading
(<a class="moz-txt-link-freetext" href="https://github.com/cds-astro/cds-fitstable-rust">https://github.com/cds-astro/cds-fitstable-rust</a>).<br>
* VizieR: Appears to provide UTF-8 in TFORMn=rA columns (Solution
1).<br>
* ??</p>
<p><br>
</p>
<p># Feedback Requested</p>
<p>I am curious about:<br>
* other possible approaches<br>
* fitsbits opinions on the most practical solution<br>
* other people interested in having UTF-8 in BINTABLE columns<br>
<br>
Currently, Solution 1 seems the simplest and Solution 2 the
safest,<br>
but I welcome constructive comments and experience from the
community.</p>
<p>Best regards,</p>
<div class="moz-signature">-- <br>
<p
style="font-family: Arial, sans-serif; color: black; font-size: 14px;"><span
style="font-weight: bold">Francois-Xavier Pineau</span><br>
Ingénieur de Recherche<br>
Tél : +33 (0)3 68 85 24 14,<br>
<a href="mailto:francois-xavier.pineau@astro.unistra.fr"
title="Contacter francois-xavier.pineau@astro.unistra.fr"
class="moz-txt-link-freetext">francois-xavier.pineau@astro.unistra.fr</a><br>
<br>
Centre de Données astronomiques de Strasbourg (CDS)<br>
11, rue de l'Université - E03<br>
<br>
<br>
</p>
</div>
</body>
</html>