UType proposals

Thu Jul 2 13:58:33 PDT 2009

Hi Norman -

In the interests of reducing the email explosion while trying to keep
this discussion manageable I respond to only a few key points below,
and collect them all together from the dozen or so emails I received.

Soon I think we should stand back and try to organize this discussion
better, and get back to the draft UTYPE specification from Mireille
which we already discussed in the DM sessions at the interop.

More details comments follow.

 	- Doug

On Thu, 2 Jul 2009, Norman Gray wrote:
> From Doug:
>> This discussion still misses the point that it is more important to
>> specify the version of the entire data model than that of a single
>> attribute, since we are dealing here with data models, not single
>> quantities.  Whatever solution we adopt should take this as the first
>> priority.

> Quite apart from anything else, including the datamodel version in the UType 
> string means it cannot get lost.  If you have a UType string and a version 
> somewhere else in the 'context', then the two things _will_ get separated 
> somehow.

As noted earlier, the key issue here is that a data model is an object,
and it is the version of the entire object (data model) that we care
about.  In real data model applications we always know what object we
are dealing with, including the version.  If we extract a single item
from the data model, losing the context, then we have something new.
The actor that does this extraction is then responsible for adequately
defining whatever new object is produced.

Furthermore we really do not want to mix and match UTYPEs from
different versions of the same model.  Whatever scheme we adopt should
discourage this, not be designed to facilitate it.

A more basic issue is that explicitly including the version number
in a UTYPE would break one of the fundamental rules of UTYPEs,
which is that they can be used by end-user science applications
via simple case-insensitive string equivalence, without parsing.
If the version number were included in a UTYPE then all the UTYPEs of
a data model would change every time a new version of the data model
is encountered.  On the other hand if the implemention deals with a
versioned data model, most of the UTYPEs can be expected to remain
the same between versions.  It is usually pretty easy to deal with
version changes at the level of the whole data model, as typically
only a few well controlled changes will occur between versions.

On Thu, 2 Jul 2009, Norman Gray wrote:
> 1: My proposal is limited to providing an answer to (1), plus some discussion 
> of how UTypes are conceptualised.  The downsides of an HTTP URI are that it 
> is longer than the UTypes defined in SSA (but bytes are cheap), and that it 
> is not trivially compatible with current SSA implementations (though I have

The issue is not just SSA of course, but all of DAL, and essentially all
of DM.  SSA, SIAV2, TAP, DAL2 arch, GDS, Characterization, Observation,
etc. etc., plus 3-5 years of standards documents and implementations.

> 3: It's important to be clear about the distinctions between ontologies and 
> vocabularies.  Terms in a 'vocabulary' have rather loose meanings (not even 
> necessarily as precise as Roy's 'probabilistic'), and have a range of use 
> cases clustering around _searching_.  You can't do inferencing with them, and 
> they're not precise enough to use for data access.  A 'data model' is an 
> 'ontology'.  Data models are very important (and they are generally more 
> sophisticated things than vocabularies), but I don't believe we have to 
> finally settle this part of the argument yet.

Here is how Wikipedia defines Ontology:

     "...an ontology is a formal representation of a set of concepts
     within a domain and the relationships between those concepts. It
     is used to reason about the properties of that domain, and may
     be used to define the domain."

A data model is not an ontology, it is an object model.  A data
model might contain data which could be used with ontological tools,
e.g., a UCD or an astronomical object classification (Target.Class in
our example).  But the data model itself describes a specific class
of object as precisely as practical.  The goal is to be precise an
unambigous, not to support inference, at least not directly.  All the
UTYPE gives us is a concise way to refer to the attributes of a data
model in the abstract, independent of representation.

On Thu, 2 Jul 2009, Norman Gray wrote:

>> Lets keep UTYPEs as simple tags used to identify data model attributes
>> in actual scientific data analysis code, and use other mechanisms
>> for these more specialized, occasionally useful, but less important
>> capabilities.  The #1 thing here is to be able to use the data model
>> for good old fashioned scientific analysis and computation.
>
> You don't _have_ to make the URI dereferenceable.  If so, then it's a simple 
> tag, which just happens to have colons and slashes in it.  If you then change 
> your mind, you can make it dereferenceable.  If it's just a dead string, 
> however, then you're stuck -- there's no possibility of future expansion 
> without inventing _another_ mechanism.

A UTYPE is not "just a dead string".  It is a concise reference to
an individual attribute of a data model.  The data model however
is a complex entity and can have all kinds of features which we do
not need to encode within each individual UTYPE.  In particular we
can easily reference a schema, look up documentation, reference an
associated ontology, or even define a rule to convert a UTYPE from
the data model into one or more URIs.  We can even use the data model
to do an actual scientific calculation if anyone still cares!

It is far *more* powerful to defer these more complex semantics to the
data model itself, than to try to pick one such feature and have it
determine how we represent the UTYPE.  One sees this in every one of
the sample URIs: all we need is the context and the thing after the "#"
to uniquely define what we are dealing with, e.g., "RadioQuietAGN" or
"Target.Class".  The URL with all of its powerful capabilities is still
there, it is just that it is part of the namespace (object) reference.