Unicode in VOTable
Walter Landry
wlandry at caltech.edu
Mon Apr 7 15:45:00 PDT 2014
Hi Mark,
My apologies for taking a while to get back to this.
Mark Taylor <m.b.taylor at bristol.ac.uk> wrote:
>> > It's possible that revisiting this in a future version of the standard
>> > might change that, though for reasons of backward compatibility that
>> > might be problematic.
>> >
>> > Having said that, I wouldn't be too surprised to find that sloppily
>> > coded VOTable readers (possibly including mine, I haven't checked)
>> > in unicode-friendly languages might actually not do that, and treat
>> > such arrays as UTF-8 strings because the language byte array
>> > handling naturally makes such interpretations.
>>
>> What I would like is a revision to the standard. It sounds like you
>> are agreeing with me that UTF-8 is, to some degree, existing usage.
>> In that case, specifying UTF-8 would be removing ambiguities and
>> codifying existing practice, not inventing new usage.
>
> "to some degree" maybe, but I suspect not very much, and to the extent
> that it is, it's certainly in contravention of what the standard says.
> So I'm not very comfortable with the idea of adjusting the definition
> in this way.
UTF-8 is 100% backwards compatible with the existing standard. I do
not understand why you are uncomfortable extending the standard in
this way.
>> > Since unicodeChar is supposed to contain unicode strings, the same
>> > reasoning doesn't apply to datatype="unicodeChar". Using UTF-16
>> > in unicodeChar follows the spirit and letter of the standard
>> > in the (overwhelmingly common?) case that none of the characters
>> > require surrogates. If surrogate pairs are required, there is
>> > a fair chance it will work anyway. So if you want to put unicode
>> > into a BINARY2 serialized VOTable, I think you should use
>> > unicodeChar arrays with a UTF-16 or maybe UCS-2 encoding.
>>
>> I can always write UTF-16 characters for my own consumption. What I
>> want is to be able to demand other readers to understand it as well,
>> in the same way that I can demand other readers to understand boolean
>> or floatComplex.
>>
>> What I want is revisions to the standard to make, for example, VOTAble
>> 1.4. The first step towards that is to get consensus here that the
>> revision is a good idea. Do you (or anyone else) agree these are good
>> revisions, or do you still have some doubts?
>
> As above: my feeling is that an adjustment from UCS-2 to UTF-16 for
> the unicodeChar type would be a good change, but I have doubts about
> redefining the char type. Other people may have different opinions.
> But if you want to write something now which there's a good chance
> will work with existing readers and will look pretty similar in
> future versions of the standard (if any) I'd advise use of unicodeChar
> and UTF-16.
I am not looking for something that _might_ work. I am proposing a
100% backwards compatible extension to the standard so it will
_definitely_ work.
> I have added a new page to the IVOA wiki with an entry on this topic:
>
> http://wiki.ivoa.net/twiki/bin/view/IVOA/VOTableIssues13
>
> If you or others have opinions, feel free to add them there, and if
> the VOTable standard is revised at some point in the future, those
> notes will be taken into account. Note however that there is not
> currently an activity leading towards a new revision of VOTable
> in the IVOA.
Thanks. I applied for credentials to make comments on that page.
Cheers,
Walter Landry
More information about the apps
mailing list