Unicode in VOTable

Mon Aug 18 09:55:03 PDT 2014

On Fri, 15 Aug 2014, Walter Landry wrote:

> Mark Taylor <m.b.taylor at bristol.ac.uk> wrote:
> > On Thu, 14 Aug 2014, Markus Demleitner wrote:
> > 
> >> Now, if we go this way: Why have a new type at all?  I'd maintain no
> >> existing valid VOTable would break if we just said something essentially
> >> like:
> >> 
> >>   VOTable considers char as byte streams that can be decoded from utf-8
> >>   for presentation purposes.   TABLEDATA encoding is presentation.
> >>   arraysize refers to the length of the bytestream always, never to
> >>   the length of any unicode code sequence decodeable from the byte
> >>   stream.
> > 
> > Yes, I think that would work.  "TABLEDATA encoding is presentation"
> > seems like a rather radical statement in terms of the way one
> > usually thinks about VOTable, but I can't think of any actual
> > negative consequences.
> 
> This sounds a lot like what I proposed back in March, so I like it
> too ;)  It would be good if we could do the same thing for unicodeChar
> and UTF-16.

Maybe.  UCS-2, though it's archaic (obsolete?) does retain the
assurance that the number of characters can be determined from
the arraysize.  If you can do UTF-8 in char then it could be
worth retaining what's currently unicodeChar for that purpose,
especially since it's not likely to be used for any other reason
when theres a UTF-8 alternative.

I've updated the VOTableIssues13 wiki page a little bit in view
of this thread.  Anybody else feel free to edit away too.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/