RegTAP bug with delimited column identifiers

Patrick Dowler pdowler.cadc at gmail.com
Fri Aug 20 19:35:17 CEST 2021


I agree that table_name cannot feasibly be modified from whatever the
provider specified and shouldn't have been in that list.
It's a bug ... erratum.

You might even note/warn that if the table_name (and column_name(s)
elsewhere?) are delimited the RegTAP implementor has to take care to
preserve the delimiters as well, if that's not stated already.

--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


On Thu, 5 Aug 2021 at 05:15, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

> Dear Registry,
>
> ADQL delimited identifiers are a pain.
>
> My old mantra just became underscored right now while I was adding a
> get_tables method to pyVO's registry interface.  That pulls table
> metadata from rr.res_table and rr.table_column.  And breaks hard on
> VizieR.
>
> You see, RegTAP in sect. 8.6 says:
>
>   The following columns MUST be lowercased during ingestion: ivoid,
>   table_name, table_type, table_utype.
>
> In the presence of SQL delimited identifiers, blindly lowercasing
> table_name is wrong and actively breaks things.  For instance, VizieR
> has a table J/A+A/437/789/table2; once that's lowercased, it requires
> a lot of VizieR knowledge to recover that form.
>
> But to write a query you need that exact form because to get away
> with that alphabet soup as a table name, you have to use delimited
> identifiers in ADQL queries, and delimited identifiers prevent the
> (about as unfortunate) case folding SQL has almost everywhere else.
>
> Hence, rr.res_table is close to useless with the largest TAP service
> on the planet.  The only reason nobody noticed is that most everyone
> is turning to GloTS for this kind of thing.  But as you know I've
> been trying to wean people off the non-standard GloTS for ages, at
> one of these days it'll have to happen.
>
> I can't quite say any more why I would have required case
> normalisation in table_name; I probably figured "it's machine
> readable, and SQLs folds case, so let's be consistent with almost
> everything else machine-readable in RegTAP".
>
> I suppose if I wrote this again I'd say "lowercase regular
> identifiers, leave SQL delimited identifiers alone".  However, that
> still wouldn't fix things right now because VizieR doesn't include
> quotes in their VODataService table names just yet.
>
> But I'd like to have a fix about now in order to make my get_tables()
> method useful when the pyvo PR goes in.  And the baroque
> normalisation rules of SQL are a bit too much for an Erratum anyway
> IMHO.
>
> Also, we are now a lot more explicit that the content of table/name
> must be exactly as it should be used in TAP, and so there's really no
> reason for any more normalisation.  Hence... How would people feel if
> I wrote a RegTAP erratum to the effect:
>
>   In section 8.6, from the sentence: “The following columns MUST be
>   lowercased during ingestion: ivoid, table_name, table_type,
>   table_utype.” remove “table_name”.  That is, with this erratum,
>   table_name is ingested literally.
>
> I think that won't break anything operationally; if anyone had relied
> on these table names, they'd have noticed it's broken with many
> VizieR tables, I suppose.  And less case folding is always a gain.
> Of course, I'll provide lots of rationale and impact assessment, but
> before going into the trouble, I'd be grateful for encouraging and/or
> dissenting voices.
>
> Thanks,
>
>             Markus
>
> PS: I'm taking the liberty of trying this out on dc.g-vo.org and then
> reg.g-vo.org starting in an hour or so; that's how confident I am
> that this is a bona fide improvement with very little risk of
> breaking things.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/registry/attachments/20210820/15bcb366/attachment.html>


More information about the registry mailing list