Unicode in VOTable
wlandry at caltech.edu
Mon Aug 18 11:08:21 PDT 2014
Mark Taylor <M.B.Taylor at bristol.ac.uk> wrote:
> On Fri, 15 Aug 2014, Walter Landry wrote:
>> Mark Taylor <m.b.taylor at bristol.ac.uk> wrote:
>> > On Thu, 14 Aug 2014, Markus Demleitner wrote:
>> >> Now, if we go this way: Why have a new type at all? I'd maintain no
>> >> existing valid VOTable would break if we just said something essentially
>> >> like:
>> >> VOTable considers char as byte streams that can be decoded from utf-8
>> >> for presentation purposes. TABLEDATA encoding is presentation.
>> >> arraysize refers to the length of the bytestream always, never to
>> >> the length of any unicode code sequence decodeable from the byte
>> >> stream.
>> > Yes, I think that would work. "TABLEDATA encoding is presentation"
>> > seems like a rather radical statement in terms of the way one
>> > usually thinks about VOTable, but I can't think of any actual
>> > negative consequences.
>> This sounds a lot like what I proposed back in March, so I like it
>> too ;) It would be good if we could do the same thing for unicodeChar
>> and UTF-16.
> Maybe. UCS-2, though it's archaic (obsolete?) does retain the
> assurance that the number of characters can be determined from
> the arraysize.
I do not know if you can even create UCS-2 these days without going
through gymnastics. For example, Java
only supports ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16LE, and
UTF-16BE. So what is probably happening is that no one is actually
writing UCS-2. They are writing UTF-16 and not noticing the
> If you can do UTF-8 in char then it could be worth retaining what's
> currently unicodeChar for that purpose, especially since it's not
> likely to be used for any other reason when theres a UTF-8
Some languages or environments (Java, C#, powershell) work more
naturally in UTF-16. But they can also handle UTF-8, so if we wanted
to deprecate unicodeChar, that would also be fine with me.
More information about the apps