Unicode in VOTable

Dave Morris dave.morris at metagrid.co.uk
Mon Aug 25 16:03:51 PDT 2014


On 2014-08-14 09:17, Markus Demleitner wrote:
> 
> Now, if we go this way: Why have a new type at all?  I'd maintain no
> existing valid VOTable would break if we just said something 
> essentially
> like:
> 
>   VOTable considers char as byte streams that can be decoded from utf-8
>   for presentation purposes.   TABLEDATA encoding is presentation.
>   arraysize refers to the length of the bytestream always, never to
>   the length of any unicode code sequence decodeable from the byte
>   stream.
> 

I'm sorry, but I don't think this is a good way to solve this.

Changing the meaning of FIELD/@arraysize from 'element count for 
everything' to 'element count for for some things and byte count for 
other things' is setting a trap for ourselves.

It breaks the Principle of least astonishment.
https://en.wikipedia.org/wiki/Principle_of_least_astonishment

Our science users are not going to understand this.
We shouldn't require them to calculate the size of a UTF-8 encoded 
bytestream in order to set the FIELD/@arraysize in their XML text 
document.

If we adopt this then I can guarantee we will see lots of user generated 
VOTables with invalid FIELD/@arraysize.


Cheers,
Dave

--------
Dave Morris
Software Developer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------




More information about the apps mailing list