utypes: a proposal
Norman Gray
norman at astro.gla.ac.uk
Thu Oct 30 08:45:01 PDT 2008
Folks,
Sorry I had to leave the DM session early yesterday -- I was due to
give a talk in the parallel GWS session.
In the session, the comments I'd like to draw attention to are:
Jonathan's remark that we probably shouldn't force utypes to do
everything, and that worrying about uniqueness may be out of scope;
Doug's emphasis that a goal was to 'flatten' a possibly strongly
hierarchical structure into key-value pairs; and Tom's agreement that
talking about namespaces does not imply XML.
The notion of 'utypes as xpaths' has been floating around for ever, it
seems, and I'm sure I remember it being proposed by someone as an
obvious solution at the first Cambridge-UK interop, back when the
universe was a lot smaller.
Here, I want to make that proposal concrete. Is there anything
_really_ wrong with this model?
How about this:
For each literal value defined in a data model, define as its utype
that XPath which would retrieve the literal values from
the 'natural' XML serialisation of the data model.
That's all -- beginning and end of proposal. The following
illustrates how this would appear.
[I'm _not_ suggesting we import all of XPath -- merely adopt a syntax
which uses a tiny subset of XPath, which is therefore trivially
compatible with it]
If the DM is actually _defined_ using an XML Schema, then this is
immediate, since the XSchema defines a serialisation, so the utypes
become 'xpaths in the instance'.
If the DM is defined in some other way -- such as the case of
Characterisation, which is defined using UML -- then there is still
almost certainly a 'natural' XML version of the model, such as the XML
fragment of Char'n which Mireille showed in her presentation yesterday.
Even if the DM has no 'natural XML serialisation' (and I don't think
we've seen one of them in the DM group's history), then there will
surely be some part-of relationship which takes you to the value from
the 'top' of the model.
Mireille showed a sample Char'n document in the session, which I think
was something like:
<characterization>
<spatialAxis>
<axisName>Sky</axisName>
<ucd>pos.eq</ucd>
<unit>deg</unit>
<coverage>
<location>
<coord coord_system_id="TT-ICRS-TOPO">
<stc:Position2D>
<stc:Value2>
<C1>132.4210</C1>
<C2>12.1232</C2>
</stc:Value2>
</stc:Position2D>
</coord>
</location>
</coverage>
</spatialAxis>
</characterization>
There are four literals in that example, namely 'pos.eq', 'deg',
'132.4210' and '12.1232'. Their utypes in this proposal would be
cha:characterization/cha:spatialAxis/cha:ucd
cha:characterization/cha:spatialAxis/cha:unit
cha:characterization/cha:spatialAxis/cha:coverage/cha:location/
cha:coord/stc:Position2D/stc:Value2/C1
cha:characterization/cha:spatialAxis/cha:coverage/cha:location/
cha:coord/stc:Position2D/stc:Value2/C2
...presuming some 'cha' namespace declaration. Making utypes
compatible with XPath ends up looking pretty much like the existing
proposal, except that '.' -> '/' and the namespace prefix is repeated.
Given that in many cases there would be only one primary data model in
use, defining a default utype namespace would make these
characterization/spatialAxis/coverage/location/coord/stc:Position2D/
stc:Value2/C1
XPaths are of course potentially a lot more complicated than that.
But I'm not suggesting we permit anything but this tiny fragment of
XPath; merely that we use a syntax which is trivially compatible with
XPath, and has a precisely definable meaning.
This has a number of advantages:
* It uses a fragment of an existing syntax -- we really, really,
don't have to reinvent this wheel;
* it's very clear where namespaces fit in;
* in some cases where applications are actually processing XML, the
utype might be incidentally useful as a way of extracting the literal
values;
* this syntax provides a _very_ clear illustration/definition of
the cases where UFIs are potentially required, namely those situations
where a simple hierarchy-based XPath such as this matches multiple
literals in a file (is The Unicity Problem anything other than that?);
* and so if it really _does_ turn out that UFIs are required, then
it will be clear how to extend this syntax in a principled and
controlled way, to create UFIs by cherrypicking one or two further
elements of XPath.
I don't believe it's sensible to omit namespacing. If you go for
fixed prefixes -- 'cha:' and only 'cha:' -- than you can't sanely
version the Char'n model. As was noted in the session, namespaces !=
XML, and if nothing else declaring a prefix/URL combination provides a
very natural mechanism for linking a data object with its
documentation. XML obviously provides a straightforward means of
declaring namespaces; I can think if a couple of ways of doing so in
FITS; it must surely be equally straightforward to do the same for ADQL.
Best wishes,
Norman
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester
More information about the dm
mailing list