a recipe for crumpets

Ed Shaya Edward.J.Shaya.1 at gsfc.nasa.gov
Thu Jan 29 09:47:29 PST 2004


Martin Hill wrote:

>
> (On a sidenote this is also why I'm not happy with the general concept 
> of Quantity: either it's going to have to be all things to all people, 
> or it will restrict the things we can represent.  It certainly puts a 
> layer between the things we need to represent, such as position, and 
> the primitives we would combine to do so).

Since XML Schema allows both extension and restriction, the Quantity 
Class can be subclassed to do just about anything that you have in mind.

> I believe we've already covered at least in principle how to map 
> between XML and existing databases on the votable list (see 
> http://ivoa.net/forum/votable/0549.htm).  Automating the mapping 
> process based on structure would be tricky - we want (well I want) 
> common XML exchange formats for our data, but it is likely everyone's 
> RDBMS datasets are in their own wierd, er *individual* style.

To transform from the generic schema to VOTable will require the 
schemata extensions that Jim Gray mentioned or we will lose 
information.  Perhaps you want to drop the table view entirely?

> If we're considering how we might *create* a database from a given XML 
> document, say when uploading to a data warehouse, then we can map 
> directly from object<->table and primitive<->cell.

Yes, this is what I had in mind.

> I will have a mull over how to map between pointers in databases and 
> pointers in XML, but I suspect those of you (eg Ed?) with good archive 
> experience can think of an elegant answer.
>
In XML one can either use an ID/IDREF mechanism (if it is all in one 
logical document) or XPointer (if not).

> Cheers,
>
> Martin
>
> Ed Shaya wrote:
>
>> We are trying to synthesize a number of requirements into a 
>> consistent model.  We want to be able to make statements about very 
>> many different types of objects using a vocabulary of terms from UCDs 
>> that is well over 1300 in number (to which we will be adding many 
>> more, I bet).  We want to be able to use XML tools, especially XPATH 
>> which then permits XQuery.  We need a high level language to express 
>> queries independent of any datacenter's organization.  We have 
>> extremely large quantities of data that require the speed and compact 
>> size of  relational databases.
>> But, our knowledge is not simply 2-dimensional and so one wants to be 
>> able to address the data as if it  is  hierarchical, even though the  
>> internal storage and access  MAY be relational.  This means that  we 
>> need  clear rules for  flattening and "crumbling".
>> Start by noting that a record in a table is usually a list of 
>> Quantities about some Object.  So we should have a clear way to 
>> identify in our XML which elements are Objects and which are 
>> Properties, perhaps by namespacing them.  Along the way we find that 
>> there are a few tricks to designing the schemas so that one generates 
>> nicer tables and directions for VOTable to develop.
>>
>> O=O(id,P*)
>> O are Objects.  Statements always begin with an O element.
>>
>> Object take P's, properties, which are of type A, G, M.
>>
>> A=A(value,error,units,O*)
>> A is an Atomic Quantity, an example is RA, and the child O's are 
>> Metadata.
>>
>> G=G((A|G)*)
>> This is a Group Property of A's, each A typically is different, an 
>> example is position with several coordinates.  In fact each A 
>> requires a bit of grouping to hold it together also, but I ignore that.
>>
>> M=M(O*)
>> This is a Membership Property that holds Objects. An example is 
>> globular clusters have M=MembersStars which holds many O=star.  It is 
>> probably best if each M is constrained to a certain range of Object 
>> type.
>> All of this is much like OWL-lite but I am paying special attention 
>> to properties which take physical Objects as children.  The OWL 
>> objectProperty is a property that takes an Instance, ie not a native 
>> number.  We are now working a notch above OWL because our Quantities 
>> are quite a bit richer than a common OWL property.
>>
>> A basic example that conforms to O then P or M, M then O.
>> Telescope
>>     name
>>     type
>>    aperture size
>>    location
>>    PositionGroup
>>             lat
>>             long
>>    M_hasInstruments
>>          Instrument1
>>                name
>>                  ....
>>          Instrument2
>>                name
>>                   ....
>> /Telescope
>>
>> We can incorporate an image into this (we may not want to, but it can 
>> be done without stretching too far) by simply noticing that each 
>> pixel mapped onto the sky is a region of the sky which is an Object.
>> We may need to extend our id to include a position Group.
>> So an image, spectra, or timeseries is
>> I=(O*,M)  The first O* is metadata and the M refers to a series of 
>> O(id,A)
>> as in M=[O(spot1,A), O(spot2,A), O(spot3,A),...., O(spotN,A)]
>> But, in this fancy image one can add additional information at any spot.
>> So, one can easily add-in O(spot1,A/P1,A2/P2),O(spot2,A,M(O*)...), 
>> etc.  Why can we do this?
>> Because it is XML and so you can do just about anything.
>>
>> And in fact we can include spectra and time series in a similar way.  
>> We simply think about a region in coordinate space as an Object.
>>
>> The path to any A Quantity starts with an O passes through 0 or more 
>> M/O, then ends with a series of G's and finally the A.  For instance:
>> Xpath = /O/M/O/G/G/A
>> represents A cluster of galaxies that M_hasGalaxies and these have 
>> velocities measured and there are radial velocities and one of them 
>> is radio redshift.
>>
>> Xpath 
>> =/GalaxyCluster at id="343"/MemberStars/Star at id="2323"/Velocities/RadialVelocities/RadioCZ 
>>
>> (Actually I am cheating a bit on the Xpath expression just for 
>> explanation).
>>
>>
>> There is a flattening algorithm that is wonderfully simple:
>> At the top level one can make tables of each ObjectType.  Then, 
>> whenever there is an M, each M becomes a table and the table id is 
>> the Xpath to M.
>> So there is a table here:
>> TableName='/GalaxyCluster at id="343"/MemberStars'
>> In the top level table, each A is 3 or so columns (value, error, 
>> units), but for an M property a single  column contains the pointer 
>> to the "MTable".
>>
>> The table consists of stars in GalaxyCluster343 and has all of the 
>> A`a and G's of A's.
>> On the unlikely chance that there are actually several MemberStars at 
>> this point one needs to allow for a qualifier attribute.  It does not 
>> modify the theory though because this is to be thought of as 
>> subclassing the M.
>>
>> One thing that I swept under the rug is the metadata in each A. These 
>> can go into FIELD/Metadata.  But, if they differ from item to item 
>> then we need a column whose cells take XML.  Also note that an Mtable 
>> in the Metadata is a likely occurrence, so this has to be transformed 
>> into a table and a pointer replaces it in the cell.
>> As it turns out Norman Grey has just described how one adds  extra 
>> branches of XML info into VOTABLE (see the VOTable discussion list, 
>> yesterday!).
>>
>> My conclusion is that one gets a wonderfully simple but powerful 
>> mechanism if one can identify XML elements as one of type O, A, G, or 
>> M.   Actually  O and  A  can be detected simply by position.  It is 
>> the M element that is difficult to distinguish (for the computer, 
>> that is) from A.  So we could name these special properties starting 
>> with M: or M_ or whatever.
>>
>> This all follows from simply noting that a table is confined to a  
>> /O/G/A or O/G/G/A (or can be cast into this) but that these may be 
>> incorporated into a hierarchical pattern by linking properties, M's.
>>
>> IF this works, it would mean that with a little bit of simple code to 
>> flatten and crumble and to convert XPATH into SQL, any relational 
>> database can become an XML ORDB. The price is that schema need to 
>> follow a few rules.
>>
>> Ed
>>
>>
>>
>>
>




More information about the dm mailing list