Unicode in VOTable

Walter Landry wlandry at caltech.edu
Fri Mar 7 11:00:39 PST 2014


Hello Everyone,

I tried sending this to votable at ivoa.net, but that mailing list seems
unattended and the message never went through.  In any case, in the
VOTable Format Definition Version 1.3, there are the statements

   VOTables support two kinds of characters: ASCII 1-byte characters
   and Unicode (UCS-2) 2-byte characters.  Unicode is a way to
   represent characters that is an alternative to ASCII. It uses two
   bytes per character instead of one, it is strongly supported by XML
   tools, and it can handle a large variety of international
   alphabets.

This is not actually true.  Unicode, in general, requires 4 bytes per
character.  There are encodings, such as UTF-16, which often only
require 2 bytes, but even UTF-16 sometimes requires more than 2 bytes
to express a character.

So, how would I express a generic unicode character in a VOTable?  Do
I encode it as UTF 8 and disguise it as ASCII?

Thanks,
Walter Landry


More information about the apps mailing list