RegTAP bug with delimited column identifiers
Patrick Dowler
pdowler.cadc at gmail.com
Fri Aug 20 19:35:17 CEST 2021
I agree that table_name cannot feasibly be modified from whatever the
provider specified and shouldn't have been in that list.
It's a bug ... erratum.
You might even note/warn that if the table_name (and column_name(s)
elsewhere?) are delimited the RegTAP implementor has to take care to
preserve the delimiters as well, if that's not stated already.
--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada
On Thu, 5 Aug 2021 at 05:15, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:
> Dear Registry,
>
> ADQL delimited identifiers are a pain.
>
> My old mantra just became underscored right now while I was adding a
> get_tables method to pyVO's registry interface. That pulls table
> metadata from rr.res_table and rr.table_column. And breaks hard on
> VizieR.
>
> You see, RegTAP in sect. 8.6 says:
>
> The following columns MUST be lowercased during ingestion: ivoid,
> table_name, table_type, table_utype.
>
> In the presence of SQL delimited identifiers, blindly lowercasing
> table_name is wrong and actively breaks things. For instance, VizieR
> has a table J/A+A/437/789/table2; once that's lowercased, it requires
> a lot of VizieR knowledge to recover that form.
>
> But to write a query you need that exact form because to get away
> with that alphabet soup as a table name, you have to use delimited
> identifiers in ADQL queries, and delimited identifiers prevent the
> (about as unfortunate) case folding SQL has almost everywhere else.
>
> Hence, rr.res_table is close to useless with the largest TAP service
> on the planet. The only reason nobody noticed is that most everyone
> is turning to GloTS for this kind of thing. But as you know I've
> been trying to wean people off the non-standard GloTS for ages, at
> one of these days it'll have to happen.
>
> I can't quite say any more why I would have required case
> normalisation in table_name; I probably figured "it's machine
> readable, and SQLs folds case, so let's be consistent with almost
> everything else machine-readable in RegTAP".
>
> I suppose if I wrote this again I'd say "lowercase regular
> identifiers, leave SQL delimited identifiers alone". However, that
> still wouldn't fix things right now because VizieR doesn't include
> quotes in their VODataService table names just yet.
>
> But I'd like to have a fix about now in order to make my get_tables()
> method useful when the pyvo PR goes in. And the baroque
> normalisation rules of SQL are a bit too much for an Erratum anyway
> IMHO.
>
> Also, we are now a lot more explicit that the content of table/name
> must be exactly as it should be used in TAP, and so there's really no
> reason for any more normalisation. Hence... How would people feel if
> I wrote a RegTAP erratum to the effect:
>
> In section 8.6, from the sentence: “The following columns MUST be
> lowercased during ingestion: ivoid, table_name, table_type,
> table_utype.” remove “table_name”. That is, with this erratum,
> table_name is ingested literally.
>
> I think that won't break anything operationally; if anyone had relied
> on these table names, they'd have noticed it's broken with many
> VizieR tables, I suppose. And less case folding is always a gain.
> Of course, I'll provide lots of rationale and impact assessment, but
> before going into the trouble, I'd be grateful for encouraging and/or
> dissenting voices.
>
> Thanks,
>
> Markus
>
> PS: I'm taking the liberty of trying this out on dc.g-vo.org and then
> reg.g-vo.org starting in an hour or so; that's how confident I am
> that this is a bona fide improvement with very little risk of
> breaking things.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/registry/attachments/20210820/15bcb366/attachment.html>
More information about the registry
mailing list