utypes: a proposal

Norman Gray norman at astro.gla.ac.uk
Thu Oct 30 08:45:01 PDT 2008


Sorry I had to leave the DM session early yesterday -- I was due to  
give a talk in the parallel GWS session.

In the session, the comments I'd like to draw attention to are:  
Jonathan's remark that we probably shouldn't force utypes to do  
everything, and that worrying about uniqueness may be out of scope;  
Doug's emphasis that a goal was to 'flatten' a possibly strongly  
hierarchical structure into key-value pairs; and Tom's agreement that  
talking about namespaces does not imply XML.

The notion of 'utypes as xpaths' has been floating around for ever, it  
seems, and I'm sure I remember it being proposed by someone as an  
obvious solution at the first Cambridge-UK interop, back when the  
universe was a lot smaller.

Here, I want to make that proposal concrete.  Is there anything  
_really_ wrong with this model?

How about this:

     For each literal value defined in a data model, define as its utype
     that XPath which would retrieve the literal values from
     the 'natural' XML serialisation of the data model.

That's all -- beginning and end of proposal.  The following  
illustrates how this would appear.

[I'm _not_ suggesting we import all of XPath -- merely adopt a syntax  
which uses a tiny subset of XPath, which is therefore trivially  
compatible with it]

If the DM is actually _defined_ using an XML Schema, then this is  
immediate, since the XSchema defines a serialisation, so the utypes  
become 'xpaths in the instance'.

If the DM is defined in some other way -- such as the case of  
Characterisation, which is defined using UML -- then there is still  
almost certainly a 'natural' XML version of the model, such as the XML  
fragment of Char'n which Mireille showed in her presentation yesterday.

Even if the DM has no 'natural XML serialisation' (and I don't think  
we've seen one of them in the DM group's history), then there will  
surely be some part-of relationship which takes you to the value from  
the 'top' of the model.

Mireille showed a sample Char'n document in the session, which I think  
was something like:

					<coord coord_system_id="TT-ICRS-TOPO">

There are four literals in that example, namely 'pos.eq', 'deg',  
'132.4210' and '12.1232'.  Their utypes in this proposal would be


...presuming some 'cha' namespace declaration.  Making utypes  
compatible with XPath ends up looking pretty much like the existing  
proposal, except that '.' -> '/' and the namespace prefix is repeated.

Given that in many cases there would be only one primary data model in  
use, defining a default utype namespace would make these


XPaths are of course potentially a lot more complicated than that.   
But I'm not suggesting we permit anything but this tiny fragment of  
XPath; merely that we use a syntax which is trivially compatible with  
XPath, and has a precisely definable meaning.

This has a number of advantages:

   * It uses a fragment of an existing syntax -- we really, really,  
don't have to reinvent this wheel;

   * it's very clear where namespaces fit in;

   * in some cases where applications are actually processing XML, the  
utype might be incidentally useful as a way of extracting the literal  

   * this syntax provides a _very_ clear illustration/definition of  
the cases where UFIs are potentially required, namely those situations  
where a simple hierarchy-based XPath such as this matches multiple  
literals in a file (is The Unicity Problem anything other than that?);

   * and so if it really _does_ turn out that UFIs are required, then  
it will be clear how to extend this syntax in a principled and  
controlled way, to create UFIs by cherrypicking one or two further  
elements of XPath.

I don't believe it's sensible to omit namespacing.  If you go for  
fixed prefixes -- 'cha:' and only 'cha:' -- than you can't sanely  
version the Char'n model.  As was noted in the session, namespaces !=  
XML, and if nothing else declaring a prefix/URL combination provides a  
very natural mechanism for linking a data object with its  
documentation.  XML obviously provides a straightforward means of  
declaring namespaces; I can think if a couple of ways of doing so in  
FITS; it must surely be equally straightforward to do the same for ADQL.

Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester

