UType proposals
Norman Gray
norman at astro.gla.ac.uk
Fri Jun 12 06:08:14 PDT 2009
Gerard and all, hello.
On 2009 Jun 12, at 12:48, Gerard wrote:
> But I definitly think there is merit in proposing a readable syntax
> for
> Utype-s derived from a (well designed) data model.
I agree. Mireille's document prescribes how to create the UType
strings from an explicit data model. These would be exactly the same
in the proposal I made, with the only difference being that <model-
name> would be a dereferenceable URL.
In contrast to Mireille's document, I don't believe it's necessary for
a UType document to _require_ that UTypes be created in a particular
way: as long as a standard's list of UTypes is made explicit, and
their links to the data model are documented, the mechanism for
generating them can be 'private' to the standard. Of course, the
recipe that Mireille described is a good one, and documents authors
might well be expected to use this recipe unless they had good reason
not to.
> mainly because we did not want to have to do two jobs:
> 1. create the model
> 2. create a list of strings uniquely representing the elements in
> the model.
Certainly, I think that generating the UTypes mechanically from some
explicit underlying data model is very much the right thing to do.
> The latter allows one to find the element represented
> by the string in the model by parsing the string.
I'm nervous of 'parsing the string'. Partly, this is because defining
the UType syntax which is to be parsed would I believe place a
substantial burden on the UType document -- it's essentially defining
a new little language, and there are enough little languages in the
world without us amateurs defining a new one. Also, however, making
the UType parseable would imply that users (ie, application authors in
the first instance) would be expected to parse it in order to get
important information, and that's not likely to be popular.
I also don't think it's necessary, since the sort of extra information
that might be available -- for example display strings/labels -- are
really part of the 'machine-readable documentation' that would be
readily available by dereferencing the namespace (though I emphasise
again that applications would generally not have to do this). Of
course, the form of that machine-readable documentation would still
have to be defined (though I have suggestions), but there are existing
extensible syntaxes for that, which don't have to be crammed into a
UType string.
> In out rule (repeated in Mireille's note) the uniqueness of the
> string is
> guaranteed by the structure of the rule and the constraints that
> data models
> impose (e.g. uniqueness of names of classes in a package, uniqueness
> of
> attribute names in a class etc).
Indeed -- a valuable property.
> The HTML documentation for the model is generated and contains for
> each
> documented element its Utype.
> Also the "intermediate representation", a more readable XML version
> of the
> XMI that is the basis for our generation pipeline, should contain
> the utype
> for each element. So in principle the association is there and we
> could have
> used SimDB:element1 etc.
I agree. All the required information, and more, is clearly already
available within SimDB artefacts, and it's just a matter of getting
this to the application.
> So for the DM group to propose a rule for deriving Utype-s from the
> data
> model that does not require a separate lookup, i.e. is parsable by
> humans I
> think is not a bad thing as it promotes homogeneity and readability
> between
> different modelling efforts.
The recipe that Mireille proposes is informally parseable by humans,
and that has substantial mnemonic value if it is used homogeneously.
It's just that turning that into a formally parseable thing would have
substantial costs with few benefits.
> One should be able to infer from the context within which the utype
> is used
> where to go to find this namespace. That's why I think such namespaces
> should for example be explicitly defined in a VOTable that uses a
> particular
> set of Utype-s. It should not depend on some list of abbreviations
> somewhare
> in the IVOA.
> Hence I also do not like the proposal NOT to insist on an
> xmlns:adql=...
> declaration to infer info about the new 'xtype' attrbute.
I may be slightly missing your point (apologies if so), but I'm
suggesting that if the full URI (and only the full URI) is regarded as
the UType, then this gives a good deal of flexibility to a
serialisation. Thus, we might have a VOTable which has
<param id='foo' utype='http://www.ivoa.net/dm/simdb/v1.0#Simulated.Foo'
>
xxx
</param>
or
<VOTABLE xmlns:simdb='http://www.ivoa.net/dm/simdb/v1.0#'>
...
<param id='foo' utype='simdb:Simulated.Foo'>
xxx
</param>
</VOTABLE>
and these would be regarded as equivalent, and the procedure for
turning the second into the first would be the province of the
definition of the serialisation. The second is obviously a lot
tidier, but the freedom to do this is merely a detail of the
serialisation, which piggybacks on the XML namespace mechanism for the
practical reason that VOTable is expressed in XML. A FITS
serialisation would have to choose a completely different way of
representing this, if that were deemed necessary. That doesn't
matter, because it's the full URI -- possibly after some string
concatenation -- which the resulting application would be required to
recognise.
> I also agree that the Utype itself should be dereferencable to HTML.
> The HTML that we generate for SimDB's data model can easily be
> accomdated to
> be such a target.
> Currently we use XMI identifiers as anchors for cross-linking, but
> as our
> Utype-s are unique those could be used as well.
Sounds good.
> Finally, I think one thing that Mireille's note does not make clear
> is that
> to be able to have a rule deriving parsable Utypes from a data model
> such as
> the one used in SimDB, one must have defined the syntactic elements
> for
> expressing one's data model. In SimDB we do this explicitly and we
> have
> proposed a similar approach to the DM group. Once one has that one
> may also
> have hope of creating instances for a specified Utype.
I don't think I follow you. By 'syntactic elements' do you mean some
parseable syntax for UTypes?
> Really finally, I think we still need discussion about what it
> really means
> to associate a Utype to some serialised construct.
> For example, in VOTable, associating a Utype to a column, pointing
> to aan
> attribute in a data model seems pretty well defined.
> The column contains values that may have been obtained from that
> attribute
> on instances of the referenced.
Yes, but...
> However what it means to associate a Utype to a TABLE is less clear.
> Should it be formally and completely equivalent to the referenced
> Class
> (supposedly) in a UML diagram.
> Should it then have the same attributes (represented as columns) for
> example.
> Or is it the meaning less strict, more conceptual. Like "the TABLE is
> somewhat similar to a SimDB:Simulation" for example.
This is indeed much less clear. I think it would be valuable and
fairly easy to do this, and it would mean that you could envisage a
future query which asked for (all of) a table by giving the table's
UType. The framework for this is in, for example, section 3 of the
proposal <http://nxg.me.uk/note/2009/utype-proposals/#composite>,
which effectively suggests that char:coverage.location.coord be a
UType which has a structured value, and this is perfectly strict.
However, as Doug and others have argued, the current primary use-case
for UTypes is the notion of a list of key-value pairs, where the keys
are UTypes and the values are literals (ie, columns or single
values). I think there are benefits and few costs to going beyond
that (if one has a clear idea of what one is doing), but that's where
this argument would live.
All the best,
Norman
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester
More information about the dm
mailing list