Moving forward with modern Unicode / UTF-8

Mark Taylor m.b.taylor at bristol.ac.uk
Thu Jul 17 11:15:49 CEST 2025


On Thu, 17 Jul 2025, Markus Demleitner via apps wrote:

> This problem is of course even more severe when we somehow imply
> utf-8 in char arrays, and concerns that arraysize would become
> something like "storage size" rather than "number of elements" when
> we go that way were too strong for me to happily go to work.

I don't think that problem is all that bad.  We just redefine the
char datatype to mean an octet of UTF-8 storage rather than a
character as such (this is completely backwardly compatible with
current usage), then arraysize makes sense without special casing.
That does mean you can't define a column containing a fixed number
of unicode characters (unless you happen to know that only ASCII
is permitted, which may well be the case e.g. ISO-8601 datestamps),
but I don't see that as much of an inconvenience.

> As to concrete next steps: I'd say two PRs (one UTF-16 in
> unicodeChar, the other UTF-8 in char) against VOTable would be great,
> and then we can see how much pushback we have against the possible
> weakening of arraysize.
> 
> I *could* see myself volunteering for that if there's really nobody
> else wanting to do that.  But I'd need a few Newtons of gentle
> nudging.

I'd be willing to have a go at such PRs, implementing the proposals
(more or less matching what Markus says above) that I made on the
apps list last month:

   http://mail.ivoa.net/pipermail/apps/2025-June/001765.html

There was some discussion following that post, but nothing that
convinced me I was on the wrong track (it's possible that others
disagree).

I won't get to that right away, so there are at least a couple of
weeks for people to object here that PRs along those lines wouldn't
be the right thing to do (or for somebody else to out-volunteer me).

Mark

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          https://www.star.bristol.ac.uk/mbt/


More information about the apps mailing list