Is XPATH the way to search a data model?

Tue May 18 03:13:35 PDT 2004

Brian,

> > Obviously, the ability to search our data models will be very important,
> > but should we just assume that XPATH is the best way to do it?
>
> 	Whether or not XPath (http://www.w3.org/TR/xpath20) is the best way,
> 	it is an accepted standard for specifying parts of an XML document, so
> 	we have to plan for allowing it to operate on our serializations.

I suppose I am asking what benefits there are in using dumb general
purpose XML tools to search the serialisation rather than searching the
data model itself using code which knows about the problem domain of the
data model. The responses of Pierre and Gerard seem to suggest I am not
alone in this.

> > My question
> > is, should we be optimising our data models specifically so that they can
> > be searched using XPATH?
>
> 	"Yes" (if you mean XQuery + XPath).

Could you expand on why?

> > This seems to be the general assumption, but I
> > have two questions with this:
> >
> > 1) Does not the fact that XPTAH is a specifically XML thing not mean that
> > it is more to do with data >formats< than data >models<? Fine if you
> > serialise your data as canonical XML but what if you use (e.g.)
> > stand-alone FITS files, or in some non-canonical XML format for which your
> > XPATH expressions are not valid?
>
> 	Simple XPath "grabs" can be made independent of the data format involved. For
> 	example, a search for a "bandPass" node, can be made on any document
> 	you like, knowing that it may have that structure within it,

But does not this assume that the different data formats all use bandpass
nodes in the same way? My point was that if some non-canonical XML format
uses some node name other than "bandpass" to store a bandpass, then XPath
would not be able to search for a bandpass.

I guess one answer is, as Ed suggests for the case of FITS, "convert
everything to canonical XML before searching".

> >
> > 2) Can it have astronomical knowledge built into it, or is it just a
> > sort of dumb regexp system for structured text? What I mean is, if for
> > instance you searched a StandardQuantity for a Frame (Frame "A") holding
> > the 3 axes:
> >
> > (heliocentric radio velocity, ICRS RA, ICRS Dec)
> >
> > could XPATH do anything sensible if the StdQ did not contain this exact
> > Frame, but instead contained a Frame (Frame "B") containing the 3 axes:
> >
> > (Galactic longitude, geocentric frequency, galactic latitude)
>
> 	Yes, you can limit the match based on the child node, e.g. your XPath
> 	might look like:
>
> 	//standardQ[CoordQuantity/axes/helioRadioVelocity]
>
> 	will only grab the standardQ with a  heliocentric radio velocity axis. To do this
> 	search "right" XQuery is called for, e.g.
>
> 	for $quantity in //standardQuantity
>             where //$quantity[CoordQuantity/axes/helioRadioVelocity]
>                   and //$quantity[CoordQuantity/axes/RA]
>                   and //$quantity[CoordQuantity/axes/DE]
> 	return $quantity
>
> 	would only return quantities that *had* heliocentric radio velocity, ICRS RA, ICRS Dec
> 	in the frame, and all other quantities would be ignored.

>       If you need to process equivalent frames (e.g. axes are different,
>       but could be converted via a mapping), you can select those and
>       pass them off to an application...

So your answer seems to be no - xpath/xquery cannot do intelligent
searching of the type I described, in that you need to use some other
application. In the above you say "you can select those..." - but how
would this selection be done - who would do it (i.e. who is the "you" in
your text)? Tell me where you would put the intelligence which allows the
data model searching system to realise that Frame A can be derived from
Frame B and which allows the Mapping between them to be constructed.

David