Unicode in VOTable

Thu Aug 14 09:35:21 PDT 2014

On Thu, 14 Aug 2014, Markus Demleitner wrote:

> Now, if we go this way: Why have a new type at all?  I'd maintain no
> existing valid VOTable would break if we just said something essentially
> like:
> 
>   VOTable considers char as byte streams that can be decoded from utf-8
>   for presentation purposes.   TABLEDATA encoding is presentation.
>   arraysize refers to the length of the bytestream always, never to
>   the length of any unicode code sequence decodeable from the byte
>   stream.

Yes, I think that would work.  "TABLEDATA encoding is presentation"
seems like a rather radical statement in terms of the way one
usually thinks about VOTable, but I can't think of any actual
negative consequences.

Note though that this change does lose you something: the possibility
to store in a VOTable text data that is known and declared to be
7-bit ASCII.  If you're in FITS'n'FORTRAN land such things can
be useful.  However, I don't know how many people are really relying
on that in practice at present.

> And then we'd have go on to the ghastly array considerations ("To
> decode multidimensional arrays coming from tabledata serialised
> tables, first create a bytestream by encoding as canonical utf-8 and
> then...").

Agreed something like that should go in, but it's a clarification of
the scheme implied by the earlier text, not an additional complication.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/