UType proposals

Fri Jun 12 07:31:55 PDT 2009

hi Norman 

> > The latter allows one to find the element represented by 
> the string in 
> > the model by parsing the string.
> 
> I'm nervous of 'parsing the string'.  Partly, this is because 
> defining the UType syntax which is to be parsed would I 
> believe place a substantial burden on the UType document -- 
> it's essentially defining a new little language, and there 
> are enough little languages in the world without us amateurs 
> defining a new one.  Also, however, making the UType 
> parseable would imply that users (ie, application authors in 
> the first instance) would be expected to parse it in order to 
> get important information, and that's not likely to be popular.
> 
> I also don't think it's necessary, since the sort of extra 
> information that might be available -- for example display 
> strings/labels -- are really part of the 'machine-readable 
> documentation' that would be readily available by 
> dereferencing the namespace (though I emphasise again that 
> applications would generally not have to do this).  Of 
> course, the form of that machine-readable documentation would 
> still have to be defined (though I have suggestions), but 
> there are existing extensible syntaxes for that, which don't 
> have to be crammed into a UType string.
> 

I meant really only parsing as in "reading", or what you call "informally
parsing by humans".

> 
> > So for the DM group to propose a rule for deriving Utype-s from the 
> > data model that does not require a separate lookup, i.e. is 
> parsable 
> > by humans I think is not a bad thing as it promotes homogeneity and 
> > readability between different modelling efforts.
> 
> The recipe that Mireille proposes is informally parseable by humans,  
> and that has substantial mnemonic value if it is used 
> homogeneously.   
> It's just that turning that into a formally parseable thing 
> would have substantial costs with few benefits.
> 
So are you suggesting that it might be ok for the Utype document to
RECOMMEND, or SUGGEST
that data modellers use these rules? Or should the document get rid of these
rules altogether,
in which case their "substantial mnemonic value" would be lost as well.

> > One should be able to infer from the context within which 
> the utype is 
> > used where to go to find this namespace. That's why I think such 
> > namespaces should for example be explicitly defined in a 
> VOTable that 
> > uses a particular set of Utype-s. It should not depend on 
> some list of 
> > abbreviations somewhare in the IVOA.
> > Hence I also do not like the proposal NOT to insist on an 
> > xmlns:adql=...
> > declaration to infer info about the new 'xtype' attrbute.
> 
> I may be slightly missing your point (apologies if so), but 
> I'm suggesting that if the full URI (and only the full URI) 
> is regarded as the UType, then this gives a good deal of 
> flexibility to a serialisation.  Thus, we might have a 
> VOTable which has
> 
>      <param id='foo' 
> utype='http://www.ivoa.net/dm/simdb/v1.0#Simulated.Foo' 
>  >
>        xxx
>      </param>
> 
> or
>      <VOTABLE xmlns:simdb='http://www.ivoa.net/dm/simdb/v1.0#'>
>        ...
>        <param id='foo' utype='simdb:Simulated.Foo'>
>          xxx
>        </param>
>      </VOTABLE>
> 
> and these would be regarded as equivalent, and the procedure 
> for turning the second into the first would be the province 
> of the definition of the serialisation.  The second is 
> obviously a lot tidier, but the freedom to do this is merely 
> a detail of the serialisation, which piggybacks on the XML 
> namespace mechanism for the practical reason that VOTable is 
> expressed in XML.  A FITS serialisation would have to choose 
> a completely different way of representing this, if that were 
> deemed necessary.  That doesn't matter, because it's the full 
> URI -- possibly after some string concatenation -- which the 
> resulting application would be required to recognise.
> 
I was assuming that you meant for utypes to be something like your first
case.
I was referring to the second usage, including the explicit xmlns:simdb=...
What I gathered from the discussion about the new 'xtype' attribute in the
VOTable session seemed to indicate
that no such xmlns declaration was desired in a case that has similarities
to what we discuss here.

> 
> > Finally, I think one thing that Mireille's note does not 
> make clear is 
> > that to be able to have a rule deriving parsable Utypes from a data 
> > model such as the one used in SimDB, one must have defined the 
> > syntactic elements for expressing one's data model. In SimDB we do 
> > this explicitly and we have proposed a similar approach to the DM 
> > group. Once one has that one may also have hope of creating 
> instances 
> > for a specified Utype.
> 
> I don't think I follow you.  By 'syntactic elements' do you 
> mean some parseable syntax for UTypes?
> 

No. What I meant was that it seems to me that if one wants to associate
Utypes in a meaningful way to a data model, one needs to understand what
kind of data model construct they may refer to/correspond to. 
The BNF-like syntax for utype-s assumes implicitly the existence of certain
data model concepts (the "syntactic elements"): Class, Model, Package etc.
Utypes may refer to any of these.
Currently the DM group does not have an agreed upon language in which to
express data models.
But in particular when you want machines to do something with utypes they
must be able to find out what kind of thing they are referring to.
There is quite some difference between an attribute and a reference, or
between a package and a class.
If we do not agree on a language for expressing the data models, it will be
hard to code against them.
Again, this is what SimDB has actually done. Because of this we can write
code that uses metadata about a model (expressed in our intermediate
representation), to infer things about instances of the model in various
forms (XMl, Java, RDB). Admittedly we are not using Utypes, these are too
limited for this purpose.

> > Really finally, I think we still need discussion about what 
> it really 
> > means to associate a Utype to some serialised construct.
> > For example, in VOTable, associating a Utype to a column, 
> pointing to 
> > aan attribute in a data model seems pretty well defined.
> > The column contains values that may have been obtained from that 
> > attribute on instances of the referenced.
> 
> Yes, but...
> 
> > However what it means to associate a Utype to a TABLE is less clear.
> > Should it be formally and completely equivalent to the referenced 
> > Class
> > (supposedly) in a UML diagram.
> > Should it then have the same attributes (represented as 
> columns) for 
> > example.
> > Or is it the meaning less strict, more conceptual. Like 
> "the TABLE is 
> > somewhat similar to a SimDB:Simulation" for example.
> 
> This is indeed much less clear.  I think it would be valuable 
> and fairly easy to do this, and it would mean that you could 
> envisage a future query which asked for (all of) a table by 
> giving the table's UType.  The framework for this is in, for 
> example, section 3 of the proposal 
> <http://nxg.me.uk/note/2009/utype-proposals/#composite>,
> which effectively suggests that char:coverage.location.coord 
> be a UType which has a structured value, and this is perfectly strict.
> 
> However, as Doug and others have argued, the current primary 
> use-case for UTypes is the notion of a list of key-value 
> pairs, where the keys are UTypes and the values are literals 
> (ie, columns or single values).  I think there are benefits 
> and few costs to going beyond that (if one has a clear idea 
> of what one is doing), but that's where this argument would live.
> 

I think the origin for utype was as an extra attribute on FIELD, where I
though it was supposed to assign extra meaning to the column in the table,
somewhat (but not very much) different from UCDs.
But utype attributes are now everywhere in VOTable, also on GROUP, TABLE and
RESOURCE. These are all complex constructs, and one might worry that in
general it may not be correct to assume a 1-1 relation to a complex
construct in a data model (unless designed to be so, like in SimDB's TAP
mapping).

For example consider a model for people with a class Individual having
attributes
- firstName
- lastName
- age
- email 
- telephone number

In a VOTable one might encounter a TABLE with name="Person" and FIELDs
(surname, dateOfBirth, emailAddress).
Could I add the utype people:Individual to the Person table?

I seem to recall you telling me about a concept in ontologies/vocabularies
that seems similar to me to the utype.
If I am not mistaken in ontologies one can point from one ontology to
another and declare that a thing in the former is
similar(/equivalent/equal/?) to a thing in the latter. Is utype a simlar
construct, and if so which (if any) of these meanings might it correspond
to?

Cheers

Gerard