Unicode in VOTable

Francois Ochsenbein francois.ochsenbein at gmail.com
Fri Jun 13 20:11:30 CEST 2025


Hello everybody,

If I may give a point of view from the original version of VOTable (a
looooong time ago...): the Unicode datatype was introduced to allow a
possibility of using internationally accepted characters beyond ascii.
At that time the UTF-8 was not a common way of dealing with Unicode,
but it is nowadays universally used, and its usage in VOTable would
certainly be quite useful.

However, I feel it's important to keep the original definition of the
"char" datatype (restricted to 7-bit ascii bytes), to ensure that an
existing application will not give weird results if "char" is extended
to UTF-8 (not only the risk of a truncation in the middle of multibyte
codepoint!). The algorithms for processing strings are generally much
more efficient when these are known to be pure ascii, this knowledge of
"ascii strings" is quite useful.

What would be the benefit of extending the "char" datatype to UTF-8,
compared to a new "utf8" datatype, or a redefinition of the unicodeChar
datatype ?

François Ochsenbein


More information about the apps mailing list