Request for VOTable code review

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Jan 8 14:19:40 CET 2020


Dear Apps,

On Wed, Dec 18, 2019 at 04:08:50PM +0000, Tom Donaldson wrote:
> Astropy and I would greatly appreciate if someone could have a look
> at this code, and enter a review here:
> https://github.com/astropy/astropy/pull/9505

I've put in an informal comment for now, in particular pointing to
our previous discussion on allowing utf-8 in votable char (which I
still think is a good idea):
http://mail.ivoa.net/pipermail/apps/2014-October/001010.html

In sum: I'm convinced exposing char[] as strings rather than bytes is
absolutely the right thing to do, and they'd even have my vote for
decoding from utf-8 rather than (VOTable-correct) ASCII.

> - Lack of direction on encoding
> - Inconsistency on sizing between TABLEDATA and BINARY serializations

...which, incidentally, is something we don't get around, and that we
already have with unicodeChar (no XML document I've ever seen uses
UCS-2, but it's what we require in BINARY2; I'll mention in passing
that UCS-2 these days isn't part of unicode any more and then pretend
I hadn't said that).


There is, however, a more sinister question here (related, but it
shouldn't block the astropy PR): What do you serialise python3
strings *into*?  Since you can't be sure that there's just ASCII in
these, it can't blindly be char[].  On the other hand,
unicodeChar[] as a VOTable type isn't pretty either, starting with
wasting one byte per char in >>99% of the strings in use in
astronomy.

Since I don't have an idea for how to solve this that I like: Does
anyone here know of an elegant solution to this (i.e., have nice,
compact chars by default but let users say "I want non-ASCII here,
really" where necessary) somewhere?

         -- Markus


More information about the apps mailing list