Unicode in VOTable
Dave Morris
dave.morris at metagrid.co.uk
Mon Aug 25 17:45:31 PDT 2014
On 2014-08-25 10:02, Markus Demleitner wrote:
>>
>> If I have a SQL database with a column defined as CHAR(3),
>>
>> CREATE TABLE my_table (
>> xyz CHAR(3)
>> );
>>
>> How would I describe that as a FIELD ?
>>
>> <FIELD name='xyz' datatype='char' arraysize='3'>
>>
>> <FIELD name='xyz' datatype='char' arraysize='12'>
>>
>> <FIELD name='xyz' datatype='char' encoding='utf-8' arraysize='3'>
>>
>> <FIELD name='xyz' datatype='char' encoding='utf-8' arraysize='12'>
>
> First, I'd hope there's no "encoding" attribute to FIELD, so let's
> discount the cases with that attribute.
>
> Other than that: I'd say do arraysize="*" here if you database
> actually stores codepoints; it's probably more space-efficient than
> any fixed size. Of course, you don't have fixed-size records then
> any more. If you want these, I'd say store UTF-8 in your database.
>
A use case for providing user data space is for the user to be able to
query a TAP service and upload the resulting VOTable to another service.
In order to do this the VOTable header needs to contain enough metadata
to enable the receiving service to re-create the orignal column
definition.
CREATE TABLE my_table (
xyz CHAR(3)
);
If the metadata just contains arraysize='*', then all the receiving
service can do is to store the data as a TEXT blob.
CREATE TABLE my_table (
xyz TEXT
);
Once the data has been stored as TEXT, the information from the original
column definition is lost.
At some point we will begin to discover new interoperability problems as
not all database platforms support the unbounded TEXT type, and
different platforms will impose different limits and restrictions on the
maximum string length.
--------
Dave Morris
Software Developer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------
More information about the apps
mailing list