Is XPATH the way to search a data model?

Mon May 17 14:16:01 PDT 2004

	Hi David,

On Monday 17 May 2004 04:35 pm, David Berry wrote:
> Obviously, the ability to search our data models will be very important,
> but should we just assume that XPATH is the best way to do it? 

	Whether or not XPath (http://www.w3.org/TR/xpath20) is the best way, 
	it is an accepted standard for specifying parts of an XML document, so
	we have to plan for allowing it to operate on our serializations.

	XQuery (http://www.w3.org/TR/xpath-datamodel)  incorporates XPath, 
	and is the more robust search solution. Unfortunately, right now, the 
	Exist project database (http://exist.sourceforge.net) is the only freely 
	available implementation for XQuery (but its a pretty good implementation..).

> My question
> is, should we be optimising our data models specifically so that they can
> be searched using XPATH? 

	"Yes" (if you mean XQuery + XPath).

> This seems to be the general assumption, but I
> have two questions with this:
>
> 1) Does not the fact that XPTAH is a specifically XML thing not mean that
> it is more to do with data >formats< than data >models<? Fine if you
> serialise your data as canonical XML but what if you use (e.g.)
> stand-alone FITS files, or in some non-canonical XML format for which your
> XPATH expressions are not valid?

	Simple XPath "grabs" can be made independent of the data format involved. For
	example, a search for a "bandPass" node, can be made on any document
	you like, knowing that it may have that structure within it, e.g.

	//bandPass

	An XQuery to pull out all the bandPass nodes would look like:

	<results> {
	for $node in //bandPass
	return  $node 
	} </results>

	which might return 

	<results>
		<bandPass>...</bandPass>
		<bandPass>...</bandPass>
		...
	</results>

	So you can grab nodes irrespective of where they occur in the document.

>
> 2) Can it have astronomical knowledge built into it, or is it just a
> sort of dumb regexp system for structured text? What I mean is, if for
> instance you searched a StandardQuantity for a Frame (Frame "A") holding
> the 3 axes:
>
> (heliocentric radio velocity, ICRS RA, ICRS Dec)
>
> could XPATH do anything sensible if the StdQ did not contain this exact
> Frame, but instead contained a Frame (Frame "B") containing the 3 axes:
>
> (Galactic longitude, geocentric frequency, galactic latitude)

	Yes, you can limit the match based on the child node, e.g. your XPath
	might look like:

	//standardQ[CoordQuantity/axes/helioRadioVelocity]

	will only grab the standardQ with a  heliocentric radio velocity axis. To do this
	search "right" XQuery is called for, e.g.

	for $quantity in //standardQuantity
            where //$quantity[CoordQuantity/axes/helioRadioVelocity]
                  and //$quantity[CoordQuantity/axes/RA]
                  and //$quantity[CoordQuantity/axes/DE]
	return $quantity

	would only return quantities that *had* heliocentric radio velocity, ICRS RA, ICRS Dec
	in the frame, and all other quantities would be ignored. If you need to
	process equivalent frames (e.g. axes are different, but could be converted via
	a mapping), you can select those and pass them off to
	an application...

>

> My questions may have revealed that I have little experience with
> XPATH, but more experience with creating customised intelligent searching
> code such as that outlined above. Hence the questions...
>

	Hope that helps.

	=b.t.

> David

-- 

  * Dr. Brian Thomas 

  * Dept of Astronomy/University of Maryland-College Park 
  * Code 630.1/Goddard Space Flight Center-NASA

  *   fax: (301) 286-1775
  * phone: (301) 286-6128 [GSFC]
           (301) 405-2312 [UMD]