Unicode in VOTable

Dave Morris dave.morris at metagrid.co.uk
Mon Aug 25 17:45:31 PDT 2014


On 2014-08-25 10:02, Markus Demleitner wrote:
>> 
>> If I have a SQL database with a column defined as CHAR(3),
>> 
>>     CREATE TABLE my_table (
>>         xyz CHAR(3)
>>         );
>> 
>> How would I describe that as a FIELD ?
>> 
>>     <FIELD name='xyz' datatype='char' arraysize='3'>
>> 
>>     <FIELD name='xyz' datatype='char' arraysize='12'>
>> 
>>     <FIELD name='xyz' datatype='char' encoding='utf-8' arraysize='3'>
>> 
>>     <FIELD name='xyz' datatype='char' encoding='utf-8' arraysize='12'>
> 
> First, I'd hope there's no "encoding" attribute to FIELD, so let's
> discount the cases with that attribute.
> 
> Other than that: I'd say do arraysize="*" here if you database
> actually stores codepoints; it's probably more space-efficient than
> any fixed size.  Of course, you don't have fixed-size records then
> any more.  If you want these, I'd say store UTF-8 in your database.
> 

A use case for providing user data space is for the user to be able to 
query a TAP service and upload the resulting VOTable to another service.

In order to do this the VOTable header needs to contain enough metadata 
to enable the receiving service to re-create the orignal column 
definition.

     CREATE TABLE my_table (
         xyz CHAR(3)
         );

If the metadata just contains arraysize='*', then all the receiving 
service can do is to store the data as a TEXT blob.

     CREATE TABLE my_table (
         xyz TEXT
         );

Once the data has been stored as TEXT, the information from the original 
column definition is lost.

At some point we will begin to discover new interoperability problems as 
not all database platforms support the unbounded TEXT type, and 
different platforms will impose different limits and restrictions on the 
maximum string length.


--------
Dave Morris
Software Developer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------




More information about the apps mailing list