Unicode in VOTable
Dave Morris
dave.morris at metagrid.co.uk
Mon Aug 25 16:03:51 PDT 2014
On 2014-08-14 09:17, Markus Demleitner wrote:
>
> Now, if we go this way: Why have a new type at all? I'd maintain no
> existing valid VOTable would break if we just said something
> essentially
> like:
>
> VOTable considers char as byte streams that can be decoded from utf-8
> for presentation purposes. TABLEDATA encoding is presentation.
> arraysize refers to the length of the bytestream always, never to
> the length of any unicode code sequence decodeable from the byte
> stream.
>
I'm sorry, but I don't think this is a good way to solve this.
Changing the meaning of FIELD/@arraysize from 'element count for
everything' to 'element count for for some things and byte count for
other things' is setting a trap for ourselves.
It breaks the Principle of least astonishment.
https://en.wikipedia.org/wiki/Principle_of_least_astonishment
Our science users are not going to understand this.
We shouldn't require them to calculate the size of a UTF-8 encoded
bytestream in order to set the FIELD/@arraysize in their XML text
document.
If we adopt this then I can guarantee we will see lots of user generated
VOTables with invalid FIELD/@arraysize.
Cheers,
Dave
--------
Dave Morris
Software Developer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------
More information about the apps
mailing list