Is XPATH the way to search a data model?
Brian Thomas
brian.thomas at gsfc.nasa.gov
Mon May 17 14:16:01 PDT 2004
Hi David,
On Monday 17 May 2004 04:35 pm, David Berry wrote:
> Obviously, the ability to search our data models will be very important,
> but should we just assume that XPATH is the best way to do it?
Whether or not XPath (http://www.w3.org/TR/xpath20) is the best way,
it is an accepted standard for specifying parts of an XML document, so
we have to plan for allowing it to operate on our serializations.
XQuery (http://www.w3.org/TR/xpath-datamodel) incorporates XPath,
and is the more robust search solution. Unfortunately, right now, the
Exist project database (http://exist.sourceforge.net) is the only freely
available implementation for XQuery (but its a pretty good implementation..).
> My question
> is, should we be optimising our data models specifically so that they can
> be searched using XPATH?
"Yes" (if you mean XQuery + XPath).
> This seems to be the general assumption, but I
> have two questions with this:
>
> 1) Does not the fact that XPTAH is a specifically XML thing not mean that
> it is more to do with data >formats< than data >models<? Fine if you
> serialise your data as canonical XML but what if you use (e.g.)
> stand-alone FITS files, or in some non-canonical XML format for which your
> XPATH expressions are not valid?
Simple XPath "grabs" can be made independent of the data format involved. For
example, a search for a "bandPass" node, can be made on any document
you like, knowing that it may have that structure within it, e.g.
//bandPass
An XQuery to pull out all the bandPass nodes would look like:
<results> {
for $node in //bandPass
return $node
} </results>
which might return
<results>
<bandPass>...</bandPass>
<bandPass>...</bandPass>
...
</results>
So you can grab nodes irrespective of where they occur in the document.
>
> 2) Can it have astronomical knowledge built into it, or is it just a
> sort of dumb regexp system for structured text? What I mean is, if for
> instance you searched a StandardQuantity for a Frame (Frame "A") holding
> the 3 axes:
>
> (heliocentric radio velocity, ICRS RA, ICRS Dec)
>
> could XPATH do anything sensible if the StdQ did not contain this exact
> Frame, but instead contained a Frame (Frame "B") containing the 3 axes:
>
> (Galactic longitude, geocentric frequency, galactic latitude)
Yes, you can limit the match based on the child node, e.g. your XPath
might look like:
//standardQ[CoordQuantity/axes/helioRadioVelocity]
will only grab the standardQ with a heliocentric radio velocity axis. To do this
search "right" XQuery is called for, e.g.
for $quantity in //standardQuantity
where //$quantity[CoordQuantity/axes/helioRadioVelocity]
and //$quantity[CoordQuantity/axes/RA]
and //$quantity[CoordQuantity/axes/DE]
return $quantity
would only return quantities that *had* heliocentric radio velocity, ICRS RA, ICRS Dec
in the frame, and all other quantities would be ignored. If you need to
process equivalent frames (e.g. axes are different, but could be converted via
a mapping), you can select those and pass them off to
an application...
>
> My questions may have revealed that I have little experience with
> XPATH, but more experience with creating customised intelligent searching
> code such as that outlined above. Hence the questions...
>
Hope that helps.
=b.t.
> David
--
* Dr. Brian Thomas
* Dept of Astronomy/University of Maryland-College Park
* Code 630.1/Goddard Space Flight Center-NASA
* fax: (301) 286-1775
* phone: (301) 286-6128 [GSFC]
(301) 405-2312 [UMD]
More information about the dm
mailing list