Rethink the Constraint-based search Query from Registry interface
Ray Plante
rplante at ncsa.uiuc.edu
Thu Apr 7 15:28:09 PDT 2005
Hey Paul,
On Thu, 7 Apr 2005, Paul Harrison wrote:
> OK - I will come straight out with it - I think that "2.1
> Constraint-based Search Query", the ADQL/XPath based registry query
> interface is an ugly compromise that suits no-body.
Okay, you're a brave soul.
Before we get too deep into what's wrong with ADQL and XPath, let's step
back and look at requirements and constraints which presumably led to this
choice.
Here's what I've extracted from Paul's last paragraph
PH.1. We should be able to form complex queries described by:
o constraints are specific attributes of the resource record
o boolean expressions for combining constraints
PH.2. It should be human readable
PH.3. It should be simple (as simple as possible) but with the same
semantics as is currently outlined in the RI spec.
Here are some other requirements I think we need:
RP.1. It should be straight-forward to support using commonly used
database technologies
RP.1.1 It should be straight-forward to support with both relational
and XML databases
RP.1.2. It should be straight-forward to convert to local query languages
including XQuery and local variations of SQL.
RP.1.2 It should be easy to parse in multiple, commonly-used languages
RP.2. It should be able to support the VOResource (+extensions) data
model.
RP.2.1. The query language should not include a definition of the data
model (i.e. the keywords that are used to form constraints).
RP.2.2. The query language specification should not need updating if the
data model is change or updated.
RP.2.3. The query language should require the use of specific attribute
names internal to the registry. (i.e. allow the use of RDB and
XDB).
RP.2.4. There should be a clear connection between attribute names the go
into the input query and the values that are returned in the
result (which is XML using VOResource).
RP.3. Constraints should support comparison operators appropriate for the
type of data.
RP.3.1. For string values, comparison operators should include at a
minimum:
o equals
o contains
o starts with
o ends with
RP.3.2 Case-independent comparisons must be possible for string values.
RP.3.3. For numeric types, comparison operators should include at a
minimum:
o equals
o less than
o less than or equal
o greater than
o greater than or equal
RP.4. Users should be able to form constraints based on coverage
easily. (Ex: return resources that cover this region of sky.)
(Some may complain about weasel words like "should", "easy", and
"straight-forward"; while it is true these are difficult to test, we can
evaluate at some level different choices based on which are easier. If we
were doing this formally, we would recast these in more concrete terms.)
Now, just to highlight how we got to section 2.1 as it is now. The
advantages of ADQL:
o the XML format means that it is broadly parseable in many languages
with existing tools.
o it has been demonstrated to convertable to both XQuery and SQL (with
technologies like XSLT).
o through its SQL roots, it provides all the capbilities in terms of
operators and support for different value types.
o it is intended to support region-based queries (using STC); we can
leverage both the STC model and emerging software to support it.
o it provides a potential point of interoperability with other services
that use ADQL.
o there is ADQL/s for human viewing.
The use of restricted XPath was motivated by:
o standard attribute names do not need to be defined specially for the
query language; they come directly from the XML entities of documents
being searched. (Thus, there's a direct connection between what you
ask for and what you get back.)
o restricted XPaths are simply long keyword names; this means that a
simple lookup can be used to map them to internal attribute names.
No internal parsing is needed. (Thus, they translate easily into
both SQL and XQuery queries.)
o they support "non-standard" VOResourse extensions equally well as
"standard" ones.
Now there may be other choices that satisfy the above requirements. If
anyone wishes to propose one, be sure that we address them and not rehash
the same discussions that led us to ADQL.
cheers,
Ray
More information about the registry
mailing list