Qualified/unqualified quoted/unquoted

Dave Morris dave.morris at metagrid.co.uk
Thu Oct 16 16:58:31 CEST 2014


On 2014-10-16 10:39, LANDAIS Gilles (OBS) wrote:
> 
> Due to the important volumetry (~25,000 table , ~ 300,000 columns),
> the web resource /tables of TAPVizieR provides the schema without the
> columns descriptions.
> This output enables web applications like TAPHandle to work with
> TAPVizieR with a reasonable size of VOTable.
> Currently, TAPVizieR provides a non-standard REST URL to get the full
> table description (with columns). The output URL uses the same XML
> schema than the standard resource /tables (VODataService/1.1).
> 
> Example:
> http://tapvizier.u-strasbg.fr/TAPVizieR/tap/tables
> http://tapvizier.u-strasbg.fr/TAPVizieR/tap/tables/II/246/out
> 

Publishing the metadata for the TAPVizieR service highlights some gaps 
in the current VODataService and TAP_SCHEMA specifications that will 
need to be clarified before this service can interoperate with other TAP 
services in the VO.

This is not the fault of the TAPVizieR service, this is due to some 
omissions in the current VODataService and TAP_SCHEMA specifications 
which are not precise enough to handle the TAPVizieR metadata.

----

The /tables endpoint

     http://tapvizier.u-strasbg.fr/TAPVizieR/tap/tables

lists 477 instances of table names with two dots but no quotes.

For example :

     <name>vbig.J/other/PZ/29.1/table</name>

----

Section 3.3 of the VODataService-1.1 specification defines the <name> 
element as containing :

     "A fully qualified name for the table."

     "This name should include all catalog or schema
     prefixes needed to sufficiently uniquely
     distinguish it in a query to the table."

However the VODataService-1.1 specification does not describe how to 
handle a table name that includes non-delimiter dots in it.

----

Based on a literal reading of the text in the VODataService-1.1 
specification

     "A fully qualified name for the table."

Implies that a /tables result containing

     <name>vbig.J/other/PZ/29.1/table</name>

refers to

     a catalog called

         'vbig'

     a schema called

         'J/other/PZ/29'

     a table called

         '1/table'

whereas a human interpreter may guess based on context that this 
actually refers to

     a schema called

         'vbig'

     a table called

         'J/other/PZ/29.1/table'

----

The current VODataService-1.1 specification needs to be updated to 
describe how the /tables output should use quotes to wrap names that 
contain non-delimiter dots or other characters outside the basic set of 
alphanumeric characters.

----

In this example the schema and table names should probably be wrapped in 
double quotes to indicate which dot is part of the table name and which 
is the delimiter between schema and table.

     <name>"vbig"."J/other/PZ/29.1/table"</name>

----

The same table metadata is also available from the TAPVizieR TAP service

     http://tapvizier.u-strasbg.fr/TAPVizieR/

via a TAP_SCHEMA query

     "SELECT schema_name, table_name FROM TAP_SCHEMA.tables"

which returns a VOTable containing

     <TR>
         <TD>vbig</TD>
         <TD>J/other/PZ/29.1/table</TD>
     </TR>

----

Section 2.6 of the TAP-1.0 specification defines the table_name column 
as

     "table name as it should be used in queries"

The text below this adds a bit more detail to the definition, but it is 
still less specific about qualifying the table name than the equivalent 
text in the VODataService-1.1 specification

     "The value of the table_name should be
     the string that is recommended for use
     in querying the table; it may or may not
     be qualified by schema and catalog name(s)
     depending on the implementation requirements."

Given the current definition of 'may or may not be qualified', the table 
name in this example could be interpreted as

     a schema called

         'J/other/PZ/29'

     a table called

         '1/table'

or as

     a table called

         'J/other/PZ/29.1/table'

 From context we can guess that this does in fact represent the 
unqualified table name containing a non-delimiter dot.

But this is a *guess*, and is not covered by the rules for representing 
qualified or unqualified names that may or may not contain non-delimiter 
dots.

----

The current TAP-1.0 specification needs to be updated to describe in how 
the metadata in the TAP_SCHEMA tables should use quotes to wrap names 
that contain non-delimiter dots or other characters outside the basic 
set of alphanumeric characters.

----

In this example the table name in the table_name column should probably 
be wrapped in double quotes to indicate that the dot is part of the 
table name and not a delimiter between schema and table.

     <TD>"J/other/PZ/29.1/table"</TD>

----

For comparison, sending the same TAP_SCHEMA query to the Gavo TAP 
servicve

     http://dc.zah.uni-heidelberg.de/__system__/adql/query/form

     "SELECT schema_name, table_name FROM TAP_SCHEMA.tables"

returns a VOTable containing

     <TR>
         <TD>twomass</TD>
         <TD>twomass.data</TD>
     </TR>

If we apply the same parsing rules that we used for the TAPVizieR 
results, then this could refer to

     a schema called

         'twomass'

     and a table called

         'twomass.data'

or this could refer to

     a schema called

         'twomass'

     and a table called

         'data'

Applying the same set of parsing rules that were needed to interpret the 
TAPVizieR TAP_SCHEMA results to the Gavo TAP_SCHEMA results mean that 
the table names in the Gavo TAP_SCHEMA results may be open to 
misinterpretation.

Note - there is nothing in any of the specifications that says that we 
cannot have combinations of catalogs, schemas, tables and columns with 
the same names.

Just because the table name 'twomass.data' starts with the same 
sub-string as the schema name 'twomass' does not by itself mean that 
'twomass.data' is the qualified table name including the parent schema 
name and delimited by a dot, rather than a table name which just happens 
to start with the same sub-string as the parent schema name and contain 
a non-delimiting dot.

----

We could simplify the parsing rules by defining both the schema name and 
table name as always unqualified, removing the need for using quotes 
within the metadata.

     <TR>
         <TD>vbig</TD>
         <TD>J/other/PZ/29.1/table</TD>
     </TR>

and

     <TR>
         <TD>twomass</TD>
         <TD>data</TD>
     </TR>

Note - in order to use the fully unqualified schema name we would have 
to add a separate column/element to the metadata to contain the catalog 
name.

----

We could simplify the parsing rules by making the table names always 
fully qualified and always wrap all the names in quotes.

     <TR>
         <TD>"vbig"</TD>
         <TD>"vbig"."J/other/PZ/29.1/table"</TD>
     </TR>

and

     <TR>
         <TD>"twomass"</TD>
         <TD>"twomass"."data"</TD>
     </TR>

Note - the schema name also needs to be quoted because schema names may 
be qualified with a catalog name and both the schema and catalog names 
may themselves contain non-delimiter dots or other non alphanumeric 
characters.

----

We could try to define a more complex set of conditional rules which 
work for both the Gavo and TAPVizieR metadata and are compatible with 
the existing service and client implementations.

----

What do you think ?
Anyone like to have a go at defining the rules for qualified/unqualified 
quoted/unquoted names ?

--------
Dave Morris
Software Developer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------




More information about the dal mailing list