TAP questions

Thu Mar 1 12:14:18 CET 2018

Hi Markus,

Thank you for the thorough answers. I'm writing my own stuff because we use SQL Server and .Net at JHU for many things and I need a library that streamlines data ingestion from a remote web service directly into SQL Server even when data sets are huge. I've eventually worked around all the problems mentioned below in some sort of way, I just wanted to let the community know about these so that a minor revision of the standards could clarify these issues.

The remaining issue is identifier quoting and table naming (whether schemaname.tablename is required or simply tablename is enough). This is why: SkyQuery will soon make it possible to write a cross-match query that references TAP sources, as well databases available locally at our db cluster. This requires parsing the queries and resolve the identifiers against the TAP sources. Now, my parser supports identifier quoting for both columns and table names, so I only have unquoted identifiers after the parsing procedure and I have to compose the ADQL queries bases on these that I can send to any TAP endpoint that talks the standard. If can't be sure whether an endpoint understands/requires/tolerates quoting or whether it combines schema names with tables names I can't really make it work with just any endpoint. Do you have any suggestions regarding this issue?

Thanks,

-Laszlo

-----Original Message-----
From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of Markus Demleitner
Sent: Thursday, March 1, 2018 10:14 AM
To: dal at ivoa.net
Subject: Re: TAP questions

Hi László,

On Wed, Feb 28, 2018 at 05:35:47PM +0100, Dobos, László wrote:

> 1. The first question is about how schema and table names should be 
> handled by the TAP_SCHEMA view. For instance, the gavo TAP endpoint at

The TAP spec says (TAP 1.1 Draft, p.24:

  The value of the table_name should be the string that is recommended
  for use in querying the table; it may or may not be qualified by
  schema and catalog name(s) depending on the implementation
  requirements.  [...] If the table name is such that the name must be
  quoted (delimited identifier in ADQL) then the value must include the
  quotes.

(similar, perhaps a bit more amibguous, language is in TAP 1.0).  So, the bottom line is: take what's in table_name and don't touch it.
It's the operator's responsibility to get that right.

> the table_name column and schema_name is separate. It is 
> straightforward to remove the schema name from the table_name column 
> if it is the same as schema_name but it's not so straightforward to 
> compose a query, which, for instance, gets the columns of a given 
> table if I only know the schema name and the table name separately. Or 
> should I go for compatibility across

That should not happen. I've always lobbied for having table_name in tap_schema.columns to be an explicit foreign key into tap_schema.tables, and certainly GloTS handles it like this (because I'd go crazy otherwise).  The TAP spec, as far as I can see, doesn't explicitly require it, but if someone uses different strings in tables.table_name and columns.table_name, they'll not show up in TOPCAT right now.

Anyway, there's no sane way to discover the columns otherwise.  So:
again, just don't touch the table_name.

> On the other hand the GAVO tap interface at throws an error for quoted 
> table names while handles quoted columns correctly:
> 
>  
> 
> SELECT  TOP 10 "raj2000", "dej2000" FROM "fk6"."part1" -- results in 
> error "'QuotedName' object has no attribute 'upper'"

Ok, that's a bug, and I'll fix it, but it would still be wrong to wantonly add quotes.  Delimited identifiers have very funky properties, and any number of things can go wrong if you assume fk6="fk6" in a SQL database (starting with the fact that SQL92 requires fk6="FK6" (if anything of that sort) -- but really, there's no telling).

Rule: *never* convert a SQL regular identifier into a delimited identifier unless you know what you're doing and why.  Which essentially is never the case with TAP/ADQL.  Use the form provided by TAP_SCHEMA (or, hopefully equivalently, /tables).

> 3. My third question is about how to deal with missing or wrong xml 
> namespaces. This is often an issue with /capabilities. For example, a 
> number of services (at least 
> http://heasarc.gsfc.nasa.gov:80/xamin/vo/tajp) returns an xml with the 
> namespace http://www.ivoa.net/xml/TAP/v1.0 which gives me a 404. Is it 
> something that's allowed by the standard or I'm supposed to come

While namespace URIs don't need to resolve in general, IVOA ones do, so if they point nowhere, they're probably wrong.  In this case, it should be

http://www.ivoa.net/xml/TAPRegExt/v1.0

The standards-compliant way to handle bad namespace URIs is to fail; as far as XML is concerned the type

{http://www.ivoa.net/xml/TAP/v1.0}TableAccess

(as declared by xamin now) has no relationship whatsoever to the type

{http://www.ivoa.net/xml/TAPRegExt/v1.0}TableAccess

that the standard mentions and that clients should expect.

If I were to write a client, however, I'd follow the golden rule of
interoperabiltiy: Be strict in what you generate and lenient in what you accept.  If what's coming back looks like a TAPRegExt capability, I'd swallow it.  Leave it to the validators to nitpick.

Incidentally, I'm still not sure why you would want to resolve the namespace URI.  If you feel you must, I'd argue you have a problem.
The IVOA servers certainly aren't designed to get a couple of hits for the schemas per TAP request, globally.

> up with a workaround? A similar issue is with the VOTables returned 
> by, for example by http://datalab.noao.edu/tap and 
> https://heasarc.gsfc.nasa.gov/xamin/vo/tap/ , which lack the default
> namespace:

Don't handle VOTable yourself if you're programming in a halfway standard language -- use a VOTable library.  It'll probably do the right [TM] thing (which often includes a bit of fudging).

[though you're right, these two are invalid VOTable; they'd need an xmlns:"xmlns:http://www.ivoa.net/xml/VOTable-1.2.xsd" in their roots; and only then will the xsi:noNamespaceSchemaLocation="xmlns:http://www.ivoa.net/xml/VOTable-1.2.xsd"
that's present do its magic].

And on your other point:

> There's a few services which ping-pong the client between http and 
> https by sending a 302 or 303 when the service url is accessed without 
> https but then after a POST to /async, the 302 URL is a http://... but 
> sending a GET to it redirects further to https://. This sort of breaks 
> client logic because if I turn on automatic redirect follow in the 
> http client library then it gets redirected even after the first 
> /async POST. But turning on automatic redirect follow in a client lib 
> is a dangerous thing anyway, especially if it's running in a server environment.
>
> One example to this is the TAP endpoint at 
> http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/tap/
>
> I can again come up with workarounds but specifying in the standard 
> that redirects should go to the same URL where the POST was sent to 
> would be better.

So, the way UWS is supposed to work, you are never expected to re-POST the parameters.  You POST them, and you get a redirect, but that you can just GET without any parameters.

That's good, because POST and redirects don't mix well (RFC 2616,
10.3: "The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.")

Now, it's conceivable that individual operators mix this with the (IMHO annoying) practice of redirecting http to https.  In the presence of a POST, that's a bug and should be reported as such (even for GETs, I'd say that's fairly odious).

       -- Markus