Rethink the Constraint-based search Query from Registry interface

Thu Apr 7 15:28:09 PDT 2005

Hey Paul,

On Thu, 7 Apr 2005, Paul Harrison wrote:
> OK - I will come straight out with it - I think that "2.1 
> Constraint-based Search Query", the ADQL/XPath based registry query 
> interface is an ugly compromise that suits no-body.

Okay, you're a brave soul.  

Before we get too deep into what's wrong with ADQL and XPath, let's step 
back and look at requirements and constraints which presumably led to this 
choice.  

Here's what I've extracted from Paul's last paragraph

PH.1.  We should be able to form complex queries described by:
         o  constraints are specific attributes of the resource record
         o  boolean expressions for combining constraints
PH.2.  It should be human readable
PH.3.  It should be simple (as simple as possible) but with the same 
         semantics as is currently outlined in the RI spec.

Here are some other requirements I think we need:

RP.1.  It should be straight-forward to support using commonly used 
       database technologies 
RP.1.1   It should be straight-forward to support with both relational 
            and XML databases
RP.1.2.  It should be straight-forward to convert to local query languages
            including XQuery and local variations of SQL.
RP.1.2   It should be easy to parse in multiple, commonly-used languages

RP.2.  It should be able to support the VOResource (+extensions) data 
       model.
RP.2.1.  The query language should not include a definition of the data 
         model (i.e. the keywords that are used to form constraints).  
RP.2.2.  The query language specification should not need updating if the 
         data model is change or updated.  
RP.2.3.  The query language should require the use of specific attribute 
         names internal to the registry.  (i.e. allow the use of RDB and 
         XDB).
RP.2.4.  There should be a clear connection between attribute names the go 
         into the input query and the values that are returned in the 
         result (which is XML using VOResource).  

RP.3.  Constraints should support comparison operators appropriate for the 
       type of data.  
RP.3.1.  For string values, comparison operators should include at a 
         minimum:
           o  equals
           o  contains
           o  starts with
           o  ends with
RP.3.2   Case-independent comparisons must be possible for string values.
RP.3.3.  For numeric types, comparison operators should include at a 
         minimum:
           o  equals
           o  less than
           o  less than or equal
           o  greater than
           o  greater than or equal

RP.4.    Users should be able to form constraints based on coverage 
         easily.  (Ex: return resources that cover this region of sky.)

(Some may complain about weasel words like "should", "easy", and 
"straight-forward"; while it is true these are difficult to test, we can 
evaluate at some level different choices based on which are easier.  If we 
were doing this formally, we would recast these in more concrete terms.)

Now, just to highlight how we got to section 2.1 as it is now.  The 
advantages of ADQL:
  o  the XML format means that it is broadly parseable in many languages
       with existing tools.  
  o  it has been demonstrated to convertable to both XQuery and SQL (with 
       technologies like XSLT).
  o  through its SQL roots, it provides all the capbilities in terms of 
       operators and support for different value types.
  o  it is intended to support region-based queries (using STC); we can 
       leverage both the STC model and emerging software to support it.  
  o  it provides a potential point of interoperability with other services 
       that use ADQL.
  o  there is ADQL/s for human viewing.  

The use of restricted XPath was motivated by:
  o  standard attribute names do not need to be defined specially for the 
     query language; they come directly from the XML entities of documents 
     being searched.  (Thus, there's a direct connection between what you 
     ask for and what you get back.)
  o  restricted XPaths are simply long keyword names; this means that a 
     simple lookup can be used to map them to internal attribute names.  
     No internal parsing is needed.  (Thus, they translate easily into 
     both SQL and XQuery queries.)
  o  they support "non-standard" VOResourse extensions equally well as 
     "standard" ones.

Now there may be other choices that satisfy the above requirements.  If
anyone wishes to propose one, be sure that we address them and not rehash
the same discussions that led us to ADQL.

cheers,
Ray