[EXTERNAL] Re: Unicode in VOTable
Jonathan Fay
jfay at microsoft.com
Thu Jun 12 18:27:31 CEST 2025
UTF-8 has been instrumental in making the web universal and interoperable to what was a non-interoperable multibyte mess before UTF8.
UTF-8 has a good fallback for surviving 7 bit single byte transports and libraries without special treatment.
Using length as a byte length of all characters in the field, and let the display logic, which most will handle UTF8 transparently, do the work of rendering.
I think it is the obvious choice.
Jonathan
-----Original Message-----
From: apps <apps-bounces at ivoa.net> On Behalf Of Mark Taylor via apps
Sent: Thursday, June 12, 2025 5:30 AM
To: Russ Allbery <eagle at eyrie.org>
Cc: Mark Taylor via apps <apps at ivoa.net>
Subject: [EXTERNAL] Re: Unicode in VOTable
On Wed, 11 Jun 2025, Russ Allbery wrote:
> For example, suppose that one has a column in the database that is
> defined as CHAR(8) with a Unicode character set. What should the
> corresponding arraysize in the TAP_SCHEMA entry be for this column? 8
> seems obviously wrong and will truncate valid data. 48 is safe but seems weird.
32, no? The wikipedia UTF-8 page says "a variable-width encoding of one to four one-byte (8-bit) code units".
> While in general I am in favor of using Unicode everywhere, do we lose
> anything by no longer having a way of marking fields as containing
> simple one-byte-per-character results that don't require any special processing?
It's a fair question, but IMO we don't lose enough to make it a worry.
In most string-processing contexts these days the default processing is UTF-8 anyway and it's the one-byte-per-character strings that require special measures (e.g. in java if you write a sloppy VOTable parser it will probably decode char arrays as UTF-8 strings already unless you try hard to stop it doing that).
Also, if people want to use single bytes, there's still the unsignedByte datatype.
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk https://www.star.bristol.ac.uk/mbt/
More information about the apps
mailing list