UType proposals

Fri Jun 12 06:08:14 PDT 2009

Gerard and all, hello.

On 2009 Jun 12, at 12:48, Gerard wrote:

> But I definitly think there is merit in proposing a readable syntax  
> for
> Utype-s derived from a (well designed) data model.

I agree.  Mireille's document prescribes how to create the UType  
strings from an explicit data model.  These would be exactly the same  
in the proposal I made, with the only difference being that <model- 
name> would be a dereferenceable URL.

In contrast to Mireille's document, I don't believe it's necessary for  
a UType document to _require_ that UTypes be created in a particular  
way: as long as a standard's list of UTypes is made explicit, and  
their links to the data model are documented, the mechanism for  
generating them can be 'private' to the standard.  Of course, the  
recipe that Mireille described is a good one, and documents authors  
might well be expected to use this recipe unless they had good reason  
not to.

> mainly because we did not want to have to do two jobs:
> 1. create the model
> 2. create a list of strings uniquely representing the elements in  
> the model.

Certainly, I think that generating the UTypes mechanically from some  
explicit underlying data model is very much the right thing to do.

> The latter allows one to find the element represented
> by the string in the model by parsing the string.

I'm nervous of 'parsing the string'.  Partly, this is because defining  
the UType syntax which is to be parsed would I believe place a  
substantial burden on the UType document -- it's essentially defining  
a new little language, and there are enough little languages in the  
world without us amateurs defining a new one.  Also, however, making  
the UType parseable would imply that users (ie, application authors in  
the first instance) would be expected to parse it in order to get  
important information, and that's not likely to be popular.

I also don't think it's necessary, since the sort of extra information  
that might be available -- for example display strings/labels -- are  
really part of the 'machine-readable documentation' that would be  
readily available by dereferencing the namespace (though I emphasise  
again that applications would generally not have to do this).  Of  
course, the form of that machine-readable documentation would still  
have to be defined (though I have suggestions), but there are existing  
extensible syntaxes for that, which don't have to be crammed into a  
UType string.

> In out rule (repeated in Mireille's note) the uniqueness of the  
> string is
> guaranteed by the structure of the rule and the constraints that  
> data models
> impose (e.g. uniqueness of names of classes in a package, uniqueness  
> of
> attribute names in a class etc).

Indeed -- a valuable property.

> The HTML documentation for the model is generated and contains for  
> each
> documented element its Utype.
> Also the "intermediate representation", a more readable XML version  
> of the
> XMI that is the basis for our generation pipeline, should contain  
> the utype
> for each element. So in principle the association is there and we  
> could have
> used SimDB:element1 etc.

I agree.  All the required information, and more, is clearly already  
available within SimDB artefacts, and it's just a matter of getting  
this to the application.

> So for the DM group to propose a rule for deriving Utype-s from the  
> data
> model that does not require a separate lookup, i.e. is parsable by  
> humans I
> think is not a bad thing as it promotes homogeneity and readability  
> between
> different modelling efforts.

The recipe that Mireille proposes is informally parseable by humans,  
and that has substantial mnemonic value if it is used homogeneously.   
It's just that turning that into a formally parseable thing would have  
substantial costs with few benefits.

> One should be able to infer from the context within which the utype  
> is used
> where to go to find this namespace. That's why I think such namespaces
> should for example be explicitly defined in a VOTable that uses a  
> particular
> set of Utype-s. It should not depend on some list of abbreviations  
> somewhare
> in the IVOA.
> Hence I also do not like the proposal NOT to insist on an  
> xmlns:adql=...
> declaration to infer info about the new 'xtype' attrbute.

I may be slightly missing your point (apologies if so), but I'm  
suggesting that if the full URI (and only the full URI) is regarded as  
the UType, then this gives a good deal of flexibility to a  
serialisation.  Thus, we might have a VOTable which has

     <param id='foo' utype='http://www.ivoa.net/dm/simdb/v1.0#Simulated.Foo' 
 >
       xxx
     </param>

or
     <VOTABLE xmlns:simdb='http://www.ivoa.net/dm/simdb/v1.0#'>
       ...
       <param id='foo' utype='simdb:Simulated.Foo'>
         xxx
       </param>
     </VOTABLE>

and these would be regarded as equivalent, and the procedure for  
turning the second into the first would be the province of the  
definition of the serialisation.  The second is obviously a lot  
tidier, but the freedom to do this is merely a detail of the  
serialisation, which piggybacks on the XML namespace mechanism for the  
practical reason that VOTable is expressed in XML.  A FITS  
serialisation would have to choose a completely different way of  
representing this, if that were deemed necessary.  That doesn't  
matter, because it's the full URI -- possibly after some string  
concatenation -- which the resulting application would be required to  
recognise.

> I also agree that the Utype itself should be dereferencable to HTML.
> The HTML that we generate for SimDB's data model can easily be  
> accomdated to
> be such a target.
> Currently we use XMI identifiers as anchors for cross-linking, but  
> as our
> Utype-s are unique those could be used as well.

Sounds good.

> Finally, I think one thing that Mireille's note does not make clear  
> is that
> to be able to have a rule deriving parsable Utypes from a data model  
> such as
> the one used in SimDB, one must have defined the syntactic elements  
> for
> expressing one's data model. In SimDB we do this explicitly and we  
> have
> proposed a similar approach to the DM group. Once one has that one  
> may also
> have hope of creating instances for a specified Utype.

I don't think I follow you.  By 'syntactic elements' do you mean some  
parseable syntax for UTypes?

> Really finally, I think we still need discussion about what it  
> really means
> to associate a Utype to some serialised construct.
> For example, in VOTable, associating a Utype to a column, pointing  
> to aan
> attribute in a data model seems pretty well defined.
> The column contains values that may have been obtained from that  
> attribute
> on instances of the referenced.

Yes, but...

> However what it means to associate a Utype to a TABLE is less clear.
> Should it be formally and completely equivalent to the referenced  
> Class
> (supposedly) in a UML diagram.
> Should it then have the same attributes (represented as columns) for
> example.
> Or is it the meaning less strict, more conceptual. Like "the TABLE is
> somewhat similar to a SimDB:Simulation" for example.

This is indeed much less clear.  I think it would be valuable and  
fairly easy to do this, and it would mean that you could envisage a  
future query which asked for (all of) a table by giving the table's  
UType.  The framework for this is in, for example, section 3 of the  
proposal <http://nxg.me.uk/note/2009/utype-proposals/#composite>,  
which effectively suggests that char:coverage.location.coord be a  
UType which has a structured value, and this is perfectly strict.

However, as Doug and others have argued, the current primary use-case  
for UTypes is the notion of a list of key-value pairs, where the keys  
are UTypes and the values are literals (ie, columns or single  
values).  I think there are benefits and few costs to going beyond  
that (if one has a clear idea of what one is doing), but that's where  
this argument would live.

All the best,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester