Unicode in VOTable
Walter Landry
wlandry at caltech.edu
Mon Aug 18 11:08:21 PDT 2014
Mark Taylor <M.B.Taylor at bristol.ac.uk> wrote:
> On Fri, 15 Aug 2014, Walter Landry wrote:
>
>> Mark Taylor <m.b.taylor at bristol.ac.uk> wrote:
>> > On Thu, 14 Aug 2014, Markus Demleitner wrote:
>> >
>> >> Now, if we go this way: Why have a new type at all? I'd maintain no
>> >> existing valid VOTable would break if we just said something essentially
>> >> like:
>> >>
>> >> VOTable considers char as byte streams that can be decoded from utf-8
>> >> for presentation purposes. TABLEDATA encoding is presentation.
>> >> arraysize refers to the length of the bytestream always, never to
>> >> the length of any unicode code sequence decodeable from the byte
>> >> stream.
>> >
>> > Yes, I think that would work. "TABLEDATA encoding is presentation"
>> > seems like a rather radical statement in terms of the way one
>> > usually thinks about VOTable, but I can't think of any actual
>> > negative consequences.
>>
>> This sounds a lot like what I proposed back in March, so I like it
>> too ;) It would be good if we could do the same thing for unicodeChar
>> and UTF-16.
>
> Maybe. UCS-2, though it's archaic (obsolete?) does retain the
> assurance that the number of characters can be determined from
> the arraysize.
I do not know if you can even create UCS-2 these days without going
through gymnastics. For example, Java
http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html
only supports ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16LE, and
UTF-16BE. So what is probably happening is that no one is actually
writing UCS-2. They are writing UTF-16 and not noticing the
difference.
> If you can do UTF-8 in char then it could be worth retaining what's
> currently unicodeChar for that purpose, especially since it's not
> likely to be used for any other reason when theres a UTF-8
> alternative.
Some languages or environments (Java, C#, powershell) work more
naturally in UTF-16. But they can also handle UTF-8, so if we wanted
to deprecate unicodeChar, that would also be fine with me.
Cheers,
Walter Landry
More information about the apps
mailing list