a recipe for crumpets

Ed Shaya edward.j.shaya.1 at gsfc.nasa.gov
Wed Jan 28 10:04:11 PST 2004


We are trying to synthesize a number of requirements into a consistent 
model.  We want to be able to make statements about very many different 
types of objects using a vocabulary of terms from UCDs that is well over 
1300 in number (to which we will be adding many more, I bet).  We want 
to be able to use XML tools, especially XPATH which then permits 
XQuery.  We need a high level language to express queries independent of 
any datacenter's organization.  We have extremely large quantities of 
data that require the speed and compact size of  relational databases.
But, our knowledge is not simply 2-dimensional and so one wants to be 
able to address the data as if it  is  hierarchical, even though the  
internal storage and access  MAY be relational.  This means that  we 
need  clear rules for  flattening and "crumbling". 

Start by noting that a record in a table is usually a list of Quantities 
about some Object.  So we should have a clear way to identify in our XML 
which elements are Objects and which are Properties, perhaps by 
namespacing them.  Along the way we find that there are a few tricks to 
designing the schemas so that one generates nicer tables and directions 
for VOTable to develop.

O=O(id,P*)
O are Objects.  Statements always begin with an O element.

Object take P's, properties, which are of type A, G, M.

A=A(value,error,units,O*)
A is an Atomic Quantity, an example is RA, and the child O's are Metadata.

G=G((A|G)*)
This is a Group Property of A's, each A typically is different, an 
example is position with several coordinates.  In fact each A requires a 
bit of grouping to hold it together also, but I ignore that.

M=M(O*)
This is a Membership Property that holds Objects. An example is globular 
clusters have M=MembersStars which holds many O=star.  It is probably 
best if each M is constrained to a certain range of Object type.
All of this is much like OWL-lite but I am paying special attention to 
properties which take physical Objects as children.  The OWL 
objectProperty is a property that takes an Instance, ie not a native 
number.  We are now working a notch above OWL because our Quantities are 
quite a bit richer than a common OWL property.

A basic example that conforms to O then P or M, M then O.
Telescope
     name
     type
    aperture size
    location
    PositionGroup
             lat
             long
    M_hasInstruments
          Instrument1
                name
                  ....
          Instrument2
                name
                   ....
/Telescope

We can incorporate an image into this (we may not want to, but it can be 
done without stretching too far) by simply noticing that each pixel 
mapped onto the sky is a region of the sky which is an Object.
We may need to extend our id to include a position Group.
So an image, spectra, or timeseries is
I=(O*,M)  The first O* is metadata and the M refers to a series of O(id,A)
as in M=[O(spot1,A), O(spot2,A), O(spot3,A),...., O(spotN,A)]
But, in this fancy image one can add additional information at any spot.
So, one can easily add-in O(spot1,A/P1,A2/P2),O(spot2,A,M(O*)...), etc.  
Why can we do this?
Because it is XML and so you can do just about anything.

And in fact we can include spectra and time series in a similar way.  We 
simply think about a region in coordinate space as an Object.

The path to any A Quantity starts with an O passes through 0 or more 
M/O, then ends with a series of G's and finally the A.  For instance:
Xpath = /O/M/O/G/G/A
represents A cluster of galaxies that M_hasGalaxies and these have 
velocities measured and there are radial velocities and one of them is 
radio redshift.

Xpath 
=/GalaxyCluster at id="343"/MemberStars/Star at id="2323"/Velocities/RadialVelocities/RadioCZ
(Actually I am cheating a bit on the Xpath expression just for explanation).


There is a flattening algorithm that is wonderfully simple:
At the top level one can make tables of each ObjectType.  Then, whenever 
there is an M, each M becomes a table and the table id is the Xpath to M.
So there is a table here:
TableName='/GalaxyCluster at id="343"/MemberStars'
In the top level table, each A is 3 or so columns (value, error, units), 
but for an M property a single  column contains the pointer to the "MTable".

The table consists of stars in GalaxyCluster343 and has all of the A`a 
and G's of A's.
On the unlikely chance that there are actually several MemberStars at 
this point one needs to allow for a qualifier attribute.  It does not 
modify the theory though because this is to be thought of as subclassing 
the M.

One thing that I swept under the rug is the metadata in each A. These 
can go into FIELD/Metadata.  But, if they differ from item to item then 
we need a column whose cells take XML.  Also note that an Mtable in the 
Metadata is a likely occurrence, so this has to be transformed into a 
table and a pointer replaces it in the cell.
As it turns out Norman Grey has just described how one adds  extra 
branches of XML info into VOTABLE (see the VOTable discussion list, 
yesterday!).

My conclusion is that one gets a wonderfully simple but powerful 
mechanism if one can identify XML elements as one of type O, A, G, or 
M.   Actually  O and  A  can be detected simply by position.  It is the 
M element that is difficult to distinguish (for the computer, that is) 
from A.  So we could name these special properties starting with M: or 
M_ or whatever.

This all follows from simply noting that a table is confined to a  
/O/G/A or O/G/G/A (or can be cast into this) but that these may be 
incorporated into a hierarchical pattern by linking properties, M's.

IF this works, it would mean that with a little bit of simple code to 
flatten and crumble and to convert XPATH into SQL, any relational 
database can become an XML ORDB. The price is that schema need to follow 
a few rules.

Ed





More information about the dm mailing list