Unicode in VOTable

Mark Taylor m.b.taylor at bristol.ac.uk
Fri Jun 13 10:21:38 CEST 2025


On Wed, 11 Jun 2025, Mark Taylor via apps wrote:

>     The downside is that a FIELD with datatype="char" arraysize="8"
>     can't store an 8-character string if those characters are emojis.

Following up this point, I would say that I don't expect it to
affect very many tables/columns.  In many cases a fixed-width
string will have some well-constrained format such as a sexagesimal 
designation or ISO-8601 date with a fixed precision, a bibcode, 
a UUID, a version string, ....  For these examples and many similar 
ones it is known that the content will be 7-bit ASCII.   
I haven't attempted to gather evidence for this, but my guess would 
be that the majority of fixed-width strings in e.g. TAP tables fall 
into that category.

String fields that might contain non-ASCII characters (names,
descriptions, comments) are more likely to be the sort of thing
for which a fixed-length value is not so appropriate anyway.

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          https://www.star.bristol.ac.uk/mbt/


More information about the apps mailing list