RegTAP bug with delimited column identifiers

Mark Taylor m.b.taylor at bristol.ac.uk
Mon Aug 23 10:28:19 CEST 2021


I agree, it should be an erratum.

On Fri, 20 Aug 2021, Patrick Dowler wrote:

> I agree that table_name cannot feasibly be modified from whatever the
> provider specified and shouldn't have been in that list.
> It's a bug ... erratum.
> 
> You might even note/warn that if the table_name (and column_name(s)
> elsewhere?) are delimited the RegTAP implementor has to take care to
> preserve the delimiters as well, if that's not stated already.
> 
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
> 
> 
> On Thu, 5 Aug 2021 at 05:15, Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
> 
> > Dear Registry,
> >
> > ADQL delimited identifiers are a pain.
> >
> > My old mantra just became underscored right now while I was adding a
> > get_tables method to pyVO's registry interface.  That pulls table
> > metadata from rr.res_table and rr.table_column.  And breaks hard on
> > VizieR.
> >
> > You see, RegTAP in sect. 8.6 says:
> >
> >   The following columns MUST be lowercased during ingestion: ivoid,
> >   table_name, table_type, table_utype.
> >
> > In the presence of SQL delimited identifiers, blindly lowercasing
> > table_name is wrong and actively breaks things.  For instance, VizieR
> > has a table J/A+A/437/789/table2; once that's lowercased, it requires
> > a lot of VizieR knowledge to recover that form.
> >
> > But to write a query you need that exact form because to get away
> > with that alphabet soup as a table name, you have to use delimited
> > identifiers in ADQL queries, and delimited identifiers prevent the
> > (about as unfortunate) case folding SQL has almost everywhere else.
> >
> > Hence, rr.res_table is close to useless with the largest TAP service
> > on the planet.  The only reason nobody noticed is that most everyone
> > is turning to GloTS for this kind of thing.  But as you know I've
> > been trying to wean people off the non-standard GloTS for ages, at
> > one of these days it'll have to happen.
> >
> > I can't quite say any more why I would have required case
> > normalisation in table_name; I probably figured "it's machine
> > readable, and SQLs folds case, so let's be consistent with almost
> > everything else machine-readable in RegTAP".
> >
> > I suppose if I wrote this again I'd say "lowercase regular
> > identifiers, leave SQL delimited identifiers alone".  However, that
> > still wouldn't fix things right now because VizieR doesn't include
> > quotes in their VODataService table names just yet.
> >
> > But I'd like to have a fix about now in order to make my get_tables()
> > method useful when the pyvo PR goes in.  And the baroque
> > normalisation rules of SQL are a bit too much for an Erratum anyway
> > IMHO.
> >
> > Also, we are now a lot more explicit that the content of table/name
> > must be exactly as it should be used in TAP, and so there's really no
> > reason for any more normalisation.  Hence... How would people feel if
> > I wrote a RegTAP erratum to the effect:
> >
> >   In section 8.6, from the sentence: “The following columns MUST be
> >   lowercased during ingestion: ivoid, table_name, table_type,
> >   table_utype.” remove “table_name”.  That is, with this erratum,
> >   table_name is ingested literally.
> >
> > I think that won't break anything operationally; if anyone had relied
> > on these table names, they'd have noticed it's broken with many
> > VizieR tables, I suppose.  And less case folding is always a gain.
> > Of course, I'll provide lots of rationale and impact assessment, but
> > before going into the trouble, I'd be grateful for encouraging and/or
> > dissenting voices.
> >
> > Thanks,
> >
> >             Markus
> >
> > PS: I'm taking the liberty of trying this out on dc.g-vo.org and then
> > reg.g-vo.org starting in an hour or so; that's how confident I am
> > that this is a bona fide improvement with very little risk of
> > breaking things.
> >
> 

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          http://www.star.bristol.ac.uk/~mbt/


More information about the registry mailing list