Comments on PQL spec

Wed Jul 8 21:47:17 PDT 2009

TAP and PQL.   I've been waiting for the TAP-related discussion to
settle down to the point where it could be commented on as a whole
rather than fragments and it looks like we are finally there now.
My perspective is as parochial as anyone elses: we've been supporting
generic queries against relational tables (and "table files") and
large-scale cross-comparison for over ten years now and my evaluation
of TAP and PQL is therefore going to be based on how well it meshes
with our tools.

This goes both ways: how easily can we wrap our tools to provide
a TAP or PQL service and how easily can we extend our tools to use
external TAP services.  I would imagine that anyone who already has
similar services will approach the problem in the same way.

I'm starting with PQL because it satisfies the 90/10 rule (or in
this case the 99/01 rule), having all the functionality we really
use in practice in a form which is very easy to parse and work with.

I'm completely supportive of TAP but worry that it has the same
problem as SGML -- too difficult for general implementation by
a wide community.  As XML provided a simpler path for the general
user/developer to work with markup languages, so I think PQL will act
as an catalyst for both data suppliers and consumers to participate
more fully in the VO.

- John Good

---

Comparison with IRSA.   IRSA provides three services:  one to get
a list of catalogs (name and description); one to get the metadata
for a catalog (column name, description and various attributes);
and one to perform a SELECT search on a catalog.  Section 4.3 of
the PQL spec provides for the first two of these, based on the
TAP_SCHEMA.tables and TAP_SCHEMA.columns.

The PQL parameters SELECT, (POS, SIZE / REGION), FROM and WHERE
overlap with our parameters for the most part with only minor
variations (e.g., we fold object name lookup into our location
parameter).  The biggest difference is in constructing the WHERE
clause.  We take an SQL fragment, which allows us to handle AND/OR
constructs where PQL is restricted to ANDed range/value constraints.

There are good arguments for both approaches.  With a more complex
WHERE clause approach there are very few realistic query use cases
that aren't supported.  On the other hand, range/value constraints
are by far the predominant query pattern and the PQL construct does
not presuppose an underlying SQL-based system.  In a future version
of PQL, I would be in favor of extending the construct to allow for
more complex WHERE clauses, so long as it does not interfere with
the ease of parsing the constraint set.

---

Multi-Position Support.   IRSA also has catalog positional
cross-comparison functionality which can be used in conjuction
with relational queries (including constraints involving fields in
both tables).  The interface to this is not as simple as the above
searching at this time.

The proposed PQL multi-position query (Section 4.2) supports some
of this but without the cross-table constraints.  It is also a
little unclear how the user would upload a table for comparison.
The inference is that one would use POST syntax to upload a table
with a user-supplied name (e.g. "mylist") and then use a construct
like "POS=@TAP_UPLOAD.mylist" to tell the service to use the table
for cross-comparison.

---

In summary, while there are few limitations which should probably
be addressed in future versions, the PQL spec as written provides
support for an overwhelming majority of real-world queries (based
bacchus 3% cat !$
cat PQL.txt

TAP and PQL.   I've been waiting for the TAP-related discussion to
settle down to the point where it could be commented on as a whole
rather than fragments and it looks like we are finally there now.
My perspective is as parochial as anyone elses: we've been supporting
generic queries against relational tables (and "table files") and
large-scale cross-comparison for over ten years now and my evaluation
of TAP and PQL is therefore going to be based on how well it meshes
with our tools.

This goes both ways: how easily can we wrap our tools to provide
a TAP or PQL service and how easily can we extend our tools to use
external TAP services.  I would imagine that anyone who already has
similar services will approach the problem in the same way.

I'm starting with PQL because it satisfies the 90/10 rule (or in
this case the 99/01 rule), having all the functionality we really
use in practice in a form which is very easy to parse and work with.

I'm completely supportive of TAP but worry that it has the same
problem as SGML -- too difficult for general implementation by
a wide community.  As XML provided a simpler path for the general
user/developer to work with markup languages, so I think PQL will act
as an catalyst for both data suppliers and consumers to participate
more fully in the VO.

---

Comparison with IRSA.   IRSA provides three services:  one to get
a list of catalogs (name and description); one to get the metadata
for a catalog (column name, description and various attributes);
and one to perform a SELECT search on a catalog.  Section 4.3 of
the PQL spec provides for the first two of these, based on the
TAP_SCHEMA.tables and TAP_SCHEMA.columns.

The PQL parameters SELECT, (POS, SIZE / REGION), FROM and WHERE
overlap with our parameters for the most part with only minor
variations (e.g., we fold object name lookup into our location
parameter).  The biggest difference is in constructing the WHERE
clause.  We take an SQL fragment, which allows us to handle AND/OR
constructs where PQL is restricted to ANDed range/value constraints.

There are good arguments for both approaches.  With a more complex
WHERE clause approach there are very few realistic query use cases
that aren't supported.  On the other hand, range/value constraints
are by far the predominant query pattern and the PQL construct does
not presuppose an underlying SQL-based system.  In a future version
of PQL, I would be in favor of extending the construct to allow for
more complex WHERE clauses, so long as it does not interfere with
the ease of parsing the constraint set.

---

Multi-Position Support.   IRSA also has catalog positional
cross-comparison functionality which can be used in conjuction
with relational queries (including constraints involving fields in
both tables).  The interface to this is not as simple as the above
searching at this time.

The proposed PQL multi-position query (Section 4.2) supports some
of this but without the cross-table constraints.  It is also a
little unclear how the user would upload a table for comparison.
The inference is that one would use POST syntax to upload a table
with a user-supplied name (e.g. "mylist") and then use a construct
like "POS=@TAP_UPLOAD.mylist" to tell the service to use the table
for cross-comparison.

---

In summary, while there are few limitations which should probably
be addressed in future versions, the PQL spec as written provides
support for an overwhelming majority of real-world queries (based
on our operational experience) and will be very easy to implement
on top of existing databases and file systems.

With a little more support for geometric overlap, I personally would
prefer to use PQL (and/or TAP) for dealing with image and spectral
metadata over the custom protocols.