Minutes MWG 2003-09-25

Thomas McGlynn tam at lheapop.gsfc.nasa.gov
Fri Oct 3 07:50:31 PDT 2003


I think this discussion is getting sidetracked here.
We've had an example query given in both XQuery
and SQL

XQuery

   <query>
   { for $b in document("nvo.caltech.edu/registry.xml")/VOResource
     where $b/Curation/Creator="Messier" and $b/@date<1800
     return <resource year="{$b/@date}"> {$b/Curation/Title}
   </resource>
   </query>

SQL

   select a.resource_id, a.resource_title, a.resource_date
   from   Resources a, Resources_Curators b
   where a.resource_id = b.resource_id
     and b.creator = "Messier"


Tony commented:

> 
> The point is that this only works if you have a table called Resources and
> another called Resources_Curators. Our goal was *not* to mandate the way a
> registry operated or was structured, only the core metadata and the way that
> different registries interoperated.

but I think this misses the point.  The XQuery specification also needed
to know where to search for the records of interest.  Alberto choose
to split information into multiple tables but joins across documents or
XML elements are likely to be needed for XQuery specifications as well.  [Though I'm
still in the dark about exactly how that is specified.]  Each query is readily
broken up into three directly comparable elements: what to query,
what to return, and what criteria the returned records must match.

As Tony says, we don't want to specify the internal structure of registries.
And that means we don't want to specify that they are stored as XML or as relational
databases.  The role of the query language is to convey the semantics of
the query.  If we use XQuery and someone has chosen to implement a database
in SQL, then somewhere that specification needs to be translated to SQL.
If the database is XML documents and our standard specifies an SQL format
then the SQL will be needed to be converted to XQuery.  We cannot get
around this...  It's pretty clear that some registries are going
to be implemented in relational databases -- it's already happened.
And it's pretty clear that some databases are going to be implemented
in XML -- I think that has already happened too!

The question that I think we need to be addressing is what specification
most cleanly and clearly describes the kind of query we want.  Not because
it needs to be human readable, but because we are going to be writing
query engines that write these queries and may need query parses to
read them.  This software will be simpler and
more reliable if the target language is simple and clear.  Human readability
is a plus, but more as a indicator.  Humans are quite good at parsing languages
and if we are having trouble, then it's likely that the software we write
to write and parse the language will also have problems.

The trivial query given above is essentially the simplest interesting query --
though I'd anticipate a good fraction of the real queries will be equivalent.
The discriminators are what the more complex queries that we wish registries
to address.  I suggested a few in an earlier e-mail with SQL versions.  I
don't know what the XQuery equivalents would look like.  Is it easier to translate
from SQL to XQuery or vice versa?  I don't know but we're going to have
to do one or the other -- or both!

One final thought.  Should we query registries with a different protocol than we
use for simple table queries?  Certainly we can make that choice.  However I think
that we need to have a clear understanding of why we have chosen to define
two distinct protocols if we do so.  From my perspective they are very similar
operations but others may differ.


	Regards,
	Tom McGlynn



More information about the registry mailing list