Moving forward with modern Unicode / UTF-8
Mark Taylor
m.b.taylor at bristol.ac.uk
Mon Nov 17 19:57:09 CET 2025
Dear Apps WG,
I consider the VOTable Pull Request dealing with this topic
(https://github.com/ivoa-std/VOTable/pull/71) now quite mature,
having been scrutinized and improved by several interested parties.
I presented its content and status (with some other related VOTableiana)
at the Apps session in the Görlitz meeting just finished, here:
https://wiki.ivoa.net/internal/IVOA/InterOpNov2025Apps/votable.pdf
I am planning to merge this PR#71 into the master branch of VOTable,
with a view to preparing a VOTable 1.6 WD, by the end of this month
if no new objections arise. If you still have concerns, suggestions
or objections, please voice them before then.
Thanks
Mark
On Thu, 7 Aug 2025, Mark Taylor wrote:
> As threatened I have made a PR following up my ideas on this.
> Since the UCS-2->UTF-16 and char->UTF-8 ideas are entangled with
> each other I didn't think it was a good idea to try to split
> it into two PRs.
>
> Discussion encouraged at https://github.com/ivoa-std/VOTable/pull/71
>
> Mark
>
> On Thu, 17 Jul 2025, Mark Taylor wrote:
>
> > On Thu, 17 Jul 2025, Markus Demleitner via apps wrote:
> >
> > > This problem is of course even more severe when we somehow imply
> > > utf-8 in char arrays, and concerns that arraysize would become
> > > something like "storage size" rather than "number of elements" when
> > > we go that way were too strong for me to happily go to work.
> >
> > I don't think that problem is all that bad. We just redefine the
> > char datatype to mean an octet of UTF-8 storage rather than a
> > character as such (this is completely backwardly compatible with
> > current usage), then arraysize makes sense without special casing.
> > That does mean you can't define a column containing a fixed number
> > of unicode characters (unless you happen to know that only ASCII
> > is permitted, which may well be the case e.g. ISO-8601 datestamps),
> > but I don't see that as much of an inconvenience.
> >
> > > As to concrete next steps: I'd say two PRs (one UTF-16 in
> > > unicodeChar, the other UTF-8 in char) against VOTable would be great,
> > > and then we can see how much pushback we have against the possible
> > > weakening of arraysize.
> > >
> > > I *could* see myself volunteering for that if there's really nobody
> > > else wanting to do that. But I'd need a few Newtons of gentle
> > > nudging.
> >
> > I'd be willing to have a go at such PRs, implementing the proposals
> > (more or less matching what Markus says above) that I made on the
> > apps list last month:
> >
> > http://mail.ivoa.net/pipermail/apps/2025-June/001765.html
> >
> > There was some discussion following that post, but nothing that
> > convinced me I was on the wrong track (it's possible that others
> > disagree).
> >
> > I won't get to that right away, so there are at least a couple of
> > weeks for people to object here that PRs along those lines wouldn't
> > be the right thing to do (or for somebody else to out-volunteer me).
> >
> > Mark
> >
> > --
> > Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> > m.b.taylor at bristol.ac.uk https://www.star.bristol.ac.uk/mbt/
> >
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bristol.ac.uk https://www.star.bristol.ac.uk/mbt/
>
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk https://www.star.bristol.ac.uk/mbt/
More information about the apps
mailing list