Moving forward with modern Unicode / UTF-8

Mark Taylor m.b.taylor at bristol.ac.uk
Thu Aug 7 09:56:04 CEST 2025


As threatened I have made a PR following up my ideas on this.
Since the UCS-2->UTF-16 and char->UTF-8 ideas are entangled with
each other I didn't think it was a good idea to try to split
it into two PRs.

Discussion encouraged at https://github.com/ivoa-std/VOTable/pull/71

Mark

On Thu, 17 Jul 2025, Mark Taylor wrote:

> On Thu, 17 Jul 2025, Markus Demleitner via apps wrote:
> 
> > This problem is of course even more severe when we somehow imply
> > utf-8 in char arrays, and concerns that arraysize would become
> > something like "storage size" rather than "number of elements" when
> > we go that way were too strong for me to happily go to work.
> 
> I don't think that problem is all that bad.  We just redefine the
> char datatype to mean an octet of UTF-8 storage rather than a
> character as such (this is completely backwardly compatible with
> current usage), then arraysize makes sense without special casing.
> That does mean you can't define a column containing a fixed number
> of unicode characters (unless you happen to know that only ASCII
> is permitted, which may well be the case e.g. ISO-8601 datestamps),
> but I don't see that as much of an inconvenience.
> 
> > As to concrete next steps: I'd say two PRs (one UTF-16 in
> > unicodeChar, the other UTF-8 in char) against VOTable would be great,
> > and then we can see how much pushback we have against the possible
> > weakening of arraysize.
> > 
> > I *could* see myself volunteering for that if there's really nobody
> > else wanting to do that.  But I'd need a few Newtons of gentle
> > nudging.
> 
> I'd be willing to have a go at such PRs, implementing the proposals
> (more or less matching what Markus says above) that I made on the
> apps list last month:
> 
>    http://mail.ivoa.net/pipermail/apps/2025-June/001765.html
> 
> There was some discussion following that post, but nothing that
> convinced me I was on the wrong track (it's possible that others
> disagree).
> 
> I won't get to that right away, so there are at least a couple of
> weeks for people to object here that PRs along those lines wouldn't
> be the right thing to do (or for somebody else to out-volunteer me).
> 
> Mark
> 
> --
> Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
> m.b.taylor at bristol.ac.uk          https://www.star.bristol.ac.uk/mbt/
> 

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          https://www.star.bristol.ac.uk/mbt/


More information about the apps mailing list