Qualified/unqualified quoted/unquoted

Marco Molinaro molinaro at oats.inaf.it
Fri Oct 17 12:00:23 CEST 2014


Dear Dave, dear all,

(long email - beware)

Summarizing and adding my perspective out of Dave's mail.
(please Dave check if I'm missing or misunderstanding something from your
mail).

--- Two examples reported

A) example from Vizier:
vbig.J/other/PZ/29.1/table
OR
<TR>
        <TD>vbig</TD>
        <TD>J/other/PZ/29.1/table</TD>
</TR>
B) example from GAVO:
twomass.data
OR
<TR>
        <TD>twomass</TD>
        <TD>twomass.data</TD>
</TR>

--- Three interpretations outlined

1) VODataService-1.1 (fully qualified name table)
A -> vbig | J/other/PZ/29 | 1/table
B -> N/A | twomass | data

proposed: quote (") all the SQL elements
It seems not to solve the _catalog_ use case, both I wouldn't take into
account catalogs in service descriptions.

2) TAP-1.0 (name to be used in queries, may/may not be qualified)
A -> conflict if considering the . delimiter in the table_name value
because we have a schema name and a table in that schema that seems to
belong to another schema.
B -> seems ok if one considers the "common" interpretation of the
TAP_SCHEMA fields

proposed: define quoting.
Catalogs seem again out of the scope.

3) Human
Human inspection resolves by common sense, but machine driven processes
cannot rely on common sense.

Now, the mail from Gilles and this reply from Dave are moving from the
discussion on TAP-1.1 we had in Banff. It is clear that the tables endpoint
in TAP requires upgrading because it is currently not a scalable capability
as it needs to be (as Vizier use case clearly shows).
The Vizier use case seems, however, to push further than the simple
scalability the issue with current tables endpoint in TAP, given the use of
the full available characters set for table namings.

My opinion is that quoting the schema and table names plus defining what we
mean by qualified names should work, but I have some concerns.

First one is about catalogs, touched by Dave's mail: they're currently
mentioned nowhere, introducing them now may create confusion. Can we just
consider the DB tree starting at schemas only?

We risk a "fully quoted" scenario. This may be mainly a client-side
problem, but I think it makes the queries less neat and readable.

If I remember correctly from Banff, there is the idea to have the tables
endpoint as a ReST tree. If so, how can we cope with quoted parts in the
URLs? Do we encode them preserving the schema/table[/column] tree? or will
it be up to the server to provide asymmetric trees if it makes use of "/"
in schema or table names?

All of these, I think, have an impact on the path we should follow to solve
the scalability issue of the tables endpoint, it was already pointed out
that a VOSI update may be a solution, Dave's suggestion do point in a
VODataService revision? Or maybe can we consider rewording in VODataService
at an errata level?

As for Gilles suggestion to add a url element to the tables, I'd prefer
leveraging on name&title coupling to build the ReST tree, without the need
to add further elements in the existing schema.

All above it's just my personal view,
Cheers,
     Marco

2014-10-16 16:58 GMT+02:00 Dave Morris <dave.morris at metagrid.co.uk>:

> On 2014-10-16 10:39, LANDAIS Gilles (OBS) wrote:
>
>>
>> Due to the important volumetry (~25,000 table , ~ 300,000 columns),
>> the web resource /tables of TAPVizieR provides the schema without the
>> columns descriptions.
>> This output enables web applications like TAPHandle to work with
>> TAPVizieR with a reasonable size of VOTable.
>> Currently, TAPVizieR provides a non-standard REST URL to get the full
>> table description (with columns). The output URL uses the same XML
>> schema than the standard resource /tables (VODataService/1.1).
>>
>> Example:
>> http://tapvizier.u-strasbg.fr/TAPVizieR/tap/tables
>> http://tapvizier.u-strasbg.fr/TAPVizieR/tap/tables/II/246/out
>>
>>
> Publishing the metadata for the TAPVizieR service highlights some gaps in
> the current VODataService and TAP_SCHEMA specifications that will need to
> be clarified before this service can interoperate with other TAP services
> in the VO.
>
> This is not the fault of the TAPVizieR service, this is due to some
> omissions in the current VODataService and TAP_SCHEMA specifications which
> are not precise enough to handle the TAPVizieR metadata.
>
> ----
>
> The /tables endpoint
>
>     http://tapvizier.u-strasbg.fr/TAPVizieR/tap/tables
>
> lists 477 instances of table names with two dots but no quotes.
>
> For example :
>
>     <name>vbig.J/other/PZ/29.1/table</name>
>
> ----
>
> Section 3.3 of the VODataService-1.1 specification defines the <name>
> element as containing :
>
>     "A fully qualified name for the table."
>
>     "This name should include all catalog or schema
>     prefixes needed to sufficiently uniquely
>     distinguish it in a query to the table."
>
> However the VODataService-1.1 specification does not describe how to
> handle a table name that includes non-delimiter dots in it.
>
> ----
>
> Based on a literal reading of the text in the VODataService-1.1
> specification
>
>     "A fully qualified name for the table."
>
> Implies that a /tables result containing
>
>     <name>vbig.J/other/PZ/29.1/table</name>
>
> refers to
>
>     a catalog called
>
>         'vbig'
>
>     a schema called
>
>         'J/other/PZ/29'
>
>     a table called
>
>         '1/table'
>
> whereas a human interpreter may guess based on context that this actually
> refers to
>
>     a schema called
>
>         'vbig'
>
>     a table called
>
>         'J/other/PZ/29.1/table'
>
> ----
>
> The current VODataService-1.1 specification needs to be updated to
> describe how the /tables output should use quotes to wrap names that
> contain non-delimiter dots or other characters outside the basic set of
> alphanumeric characters.
>
> ----
>
> In this example the schema and table names should probably be wrapped in
> double quotes to indicate which dot is part of the table name and which is
> the delimiter between schema and table.
>
>     <name>"vbig"."J/other/PZ/29.1/table"</name>
>
> ----
>
> The same table metadata is also available from the TAPVizieR TAP service
>
>     http://tapvizier.u-strasbg.fr/TAPVizieR/
>
> via a TAP_SCHEMA query
>
>     "SELECT schema_name, table_name FROM TAP_SCHEMA.tables"
>
> which returns a VOTable containing
>
>     <TR>
>         <TD>vbig</TD>
>         <TD>J/other/PZ/29.1/table</TD>
>     </TR>
>
> ----
>
> Section 2.6 of the TAP-1.0 specification defines the table_name column as
>
>     "table name as it should be used in queries"
>
> The text below this adds a bit more detail to the definition, but it is
> still less specific about qualifying the table name than the equivalent
> text in the VODataService-1.1 specification
>
>     "The value of the table_name should be
>     the string that is recommended for use
>     in querying the table; it may or may not
>     be qualified by schema and catalog name(s)
>     depending on the implementation requirements."
>
> Given the current definition of 'may or may not be qualified', the table
> name in this example could be interpreted as
>
>     a schema called
>
>         'J/other/PZ/29'
>
>     a table called
>
>         '1/table'
>
> or as
>
>     a table called
>
>         'J/other/PZ/29.1/table'
>
> From context we can guess that this does in fact represent the unqualified
> table name containing a non-delimiter dot.
>
> But this is a *guess*, and is not covered by the rules for representing
> qualified or unqualified names that may or may not contain non-delimiter
> dots.
>
> ----
>
> The current TAP-1.0 specification needs to be updated to describe in how
> the metadata in the TAP_SCHEMA tables should use quotes to wrap names that
> contain non-delimiter dots or other characters outside the basic set of
> alphanumeric characters.
>
> ----
>
> In this example the table name in the table_name column should probably be
> wrapped in double quotes to indicate that the dot is part of the table name
> and not a delimiter between schema and table.
>
>     <TD>"J/other/PZ/29.1/table"</TD>
>
> ----
>
> For comparison, sending the same TAP_SCHEMA query to the Gavo TAP servicve
>
>     http://dc.zah.uni-heidelberg.de/__system__/adql/query/form
>
>     "SELECT schema_name, table_name FROM TAP_SCHEMA.tables"
>
> returns a VOTable containing
>
>     <TR>
>         <TD>twomass</TD>
>         <TD>twomass.data</TD>
>     </TR>
>
> If we apply the same parsing rules that we used for the TAPVizieR results,
> then this could refer to
>
>     a schema called
>
>         'twomass'
>
>     and a table called
>
>         'twomass.data'
>
> or this could refer to
>
>     a schema called
>
>         'twomass'
>
>     and a table called
>
>         'data'
>
> Applying the same set of parsing rules that were needed to interpret the
> TAPVizieR TAP_SCHEMA results to the Gavo TAP_SCHEMA results mean that the
> table names in the Gavo TAP_SCHEMA results may be open to misinterpretation.
>
> Note - there is nothing in any of the specifications that says that we
> cannot have combinations of catalogs, schemas, tables and columns with the
> same names.
>
> Just because the table name 'twomass.data' starts with the same sub-string
> as the schema name 'twomass' does not by itself mean that 'twomass.data' is
> the qualified table name including the parent schema name and delimited by
> a dot, rather than a table name which just happens to start with the same
> sub-string as the parent schema name and contain a non-delimiting dot.
>
> ----
>
> We could simplify the parsing rules by defining both the schema name and
> table name as always unqualified, removing the need for using quotes within
> the metadata.
>
>     <TR>
>         <TD>vbig</TD>
>         <TD>J/other/PZ/29.1/table</TD>
>     </TR>
>
> and
>
>     <TR>
>         <TD>twomass</TD>
>         <TD>data</TD>
>     </TR>
>
> Note - in order to use the fully unqualified schema name we would have to
> add a separate column/element to the metadata to contain the catalog name.
>
> ----
>
> We could simplify the parsing rules by making the table names always fully
> qualified and always wrap all the names in quotes.
>
>     <TR>
>         <TD>"vbig"</TD>
>         <TD>"vbig"."J/other/PZ/29.1/table"</TD>
>     </TR>
>
> and
>
>     <TR>
>         <TD>"twomass"</TD>
>         <TD>"twomass"."data"</TD>
>     </TR>
>
> Note - the schema name also needs to be quoted because schema names may be
> qualified with a catalog name and both the schema and catalog names may
> themselves contain non-delimiter dots or other non alphanumeric characters.
>
> ----
>
> We could try to define a more complex set of conditional rules which work
> for both the Gavo and TAPVizieR metadata and are compatible with the
> existing service and client implementations.
>
> ----
>
> What do you think ?
> Anyone like to have a go at defining the rules for qualified/unqualified
> quoted/unquoted names ?
>
> --------
> Dave Morris
> Software Developer
> Wide Field Astronomy Unit
> Institute for Astronomy
> University of Edinburgh
> --------
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20141017/e1282e67/attachment.html>


More information about the dal mailing list