utypes: a proposal

Norman Gray norman at astro.gla.ac.uk
Thu Oct 30 08:45:01 PDT 2008


Folks,

Sorry I had to leave the DM session early yesterday -- I was due to  
give a talk in the parallel GWS session.

In the session, the comments I'd like to draw attention to are:  
Jonathan's remark that we probably shouldn't force utypes to do  
everything, and that worrying about uniqueness may be out of scope;  
Doug's emphasis that a goal was to 'flatten' a possibly strongly  
hierarchical structure into key-value pairs; and Tom's agreement that  
talking about namespaces does not imply XML.

The notion of 'utypes as xpaths' has been floating around for ever, it  
seems, and I'm sure I remember it being proposed by someone as an  
obvious solution at the first Cambridge-UK interop, back when the  
universe was a lot smaller.

Here, I want to make that proposal concrete.  Is there anything  
_really_ wrong with this model?



How about this:

     For each literal value defined in a data model, define as its utype
     that XPath which would retrieve the literal values from
     the 'natural' XML serialisation of the data model.

That's all -- beginning and end of proposal.  The following  
illustrates how this would appear.

[I'm _not_ suggesting we import all of XPath -- merely adopt a syntax  
which uses a tiny subset of XPath, which is therefore trivially  
compatible with it]

If the DM is actually _defined_ using an XML Schema, then this is  
immediate, since the XSchema defines a serialisation, so the utypes  
become 'xpaths in the instance'.

If the DM is defined in some other way -- such as the case of  
Characterisation, which is defined using UML -- then there is still  
almost certainly a 'natural' XML version of the model, such as the XML  
fragment of Char'n which Mireille showed in her presentation yesterday.

Even if the DM has no 'natural XML serialisation' (and I don't think  
we've seen one of them in the DM group's history), then there will  
surely be some part-of relationship which takes you to the value from  
the 'top' of the model.

Mireille showed a sample Char'n document in the session, which I think  
was something like:

	<characterization>
		<spatialAxis>
			<axisName>Sky</axisName>
			<ucd>pos.eq</ucd>
			<unit>deg</unit>
			<coverage>
				<location>
					<coord coord_system_id="TT-ICRS-TOPO">
						<stc:Position2D>
							<stc:Value2>
								<C1>132.4210</C1>
								<C2>12.1232</C2>
							</stc:Value2>
						</stc:Position2D>
					</coord>
				</location>
                        </coverage>
                </spatialAxis>
         </characterization>

There are four literals in that example, namely 'pos.eq', 'deg',  
'132.4210' and '12.1232'.  Their utypes in this proposal would be

cha:characterization/cha:spatialAxis/cha:ucd
cha:characterization/cha:spatialAxis/cha:unit
cha:characterization/cha:spatialAxis/cha:coverage/cha:location/ 
cha:coord/stc:Position2D/stc:Value2/C1
cha:characterization/cha:spatialAxis/cha:coverage/cha:location/ 
cha:coord/stc:Position2D/stc:Value2/C2

...presuming some 'cha' namespace declaration.  Making utypes  
compatible with XPath ends up looking pretty much like the existing  
proposal, except that '.' -> '/' and the namespace prefix is repeated.

Given that in many cases there would be only one primary data model in  
use, defining a default utype namespace would make these

characterization/spatialAxis/coverage/location/coord/stc:Position2D/ 
stc:Value2/C1

XPaths are of course potentially a lot more complicated than that.   
But I'm not suggesting we permit anything but this tiny fragment of  
XPath; merely that we use a syntax which is trivially compatible with  
XPath, and has a precisely definable meaning.

This has a number of advantages:

   * It uses a fragment of an existing syntax -- we really, really,  
don't have to reinvent this wheel;

   * it's very clear where namespaces fit in;

   * in some cases where applications are actually processing XML, the  
utype might be incidentally useful as a way of extracting the literal  
values;

   * this syntax provides a _very_ clear illustration/definition of  
the cases where UFIs are potentially required, namely those situations  
where a simple hierarchy-based XPath such as this matches multiple  
literals in a file (is The Unicity Problem anything other than that?);

   * and so if it really _does_ turn out that UFIs are required, then  
it will be clear how to extend this syntax in a principled and  
controlled way, to create UFIs by cherrypicking one or two further  
elements of XPath.

I don't believe it's sensible to omit namespacing.  If you go for  
fixed prefixes -- 'cha:' and only 'cha:' -- than you can't sanely  
version the Char'n model.  As was noted in the session, namespaces !=  
XML, and if nothing else declaring a prefix/URL combination provides a  
very natural mechanism for linking a data object with its  
documentation.  XML obviously provides a straightforward means of  
declaring namespaces; I can think if a couple of ways of doing so in  
FITS; it must surely be equally straightforward to do the same for ADQL.

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester



More information about the dm mailing list