UType proposals

Tue Jul 21 07:42:33 PDT 2009

Doug and all, hello.

I confess I've taken a bit of a holiday from this discussion, and been  
off-line
for part of the time, however there is still something to add to  
Doug's last
message.

On Thu, 2 July 2009, Doug Tody wrote:
 > In the interests of reducing the email explosion while trying to keep
 > this discussion manageable I respond to only a few key points below,
 > and collect them all together from the dozen or so emails I received.

Fine by me.  Let's talk about these points, then.

 > Soon I think we should stand back and try to organize this discussion
 > better, and get back to the draft UTYPE specification from Mireille
 > which we already discussed in the DM sessions at the interop.

So far, my contributions to this discussion have been:

   1. a detailed statement of what I believe the problem to be
   <http://nxg.me.uk/note/2009/utype-questions>;

   2. a detailed proposal, explicitly aligned with Mireille's proposal,
   responding to those questions, and stating its assumptions
   <http://nxg.me.uk/note/2009/utype-proposals/>;

   3. an attempt to distinguish the three distinct strands of discussion
   <http://www.ivoa.net/forum/semantics/0907/0936.htm>, and stop them
   interfering with each other (this proved to be futile).

I'm sure there are a number of ways we can organise the discussion  
better.

 > > On Thu, 2 Jul 2009, Norman Gray wrote:
 > > From Doug:
 > > > This discussion still misses the point that it is more  
important to
 > > > specify the version of the entire data model than that of a  
single
 > > > attribute, since we are dealing here with data models, not single
 > > > quantities.  Whatever solution we adopt should take this as the  
first
 > > > priority.
 > >
 > > Quite apart from anything else, including the datamodel version  
in the
 > > UType string means it cannot get lost.  If you have a UType  
string and
 > > a version somewhere else in the 'context', then the two things  
_will_
 > > get separated somehow.
 >
 > As noted earlier, the key issue here is that a data model is an  
object,
 > and it is the version of the entire object (data model) that we care
 > about.  In real data model applications we always know what object we
 > are dealing with, including the version.  If we extract a single item
 > from the data model, losing the context, then we have something new.
 > The actor that does this extraction is then responsible for  
adequately
 > defining whatever new object is produced.

I have not suggested extracting a single item from the data model.  I
have suggested that it is good to be able to refer to a single item from
the data model in isolation.  As I have made very clear, I  
specifically want to avoid
losing the context of such an iten if one does this.

If an actor were to extract such an item (and I think this would be
legitimate in some circumstances, enough that an application should be
at least given the freedom to do this) then of course it would be
responsible for defining the meaning of the resulting data.

 > Furthermore we really do not want to mix and match UTYPEs from
 > different versions of the same model.  Whatever scheme we adopt  
should
 > discourage this, not be designed to facilitate it.

You may not want to do this; others do.  The IVOA is the forum for  
discussing
the implied compromise.

 > A more basic issue is that explicitly including the version number
 > in a UTYPE would break one of the fundamental rules of UTYPEs,
 > which is that they can be used by end-user science applications
 > via simple case-insensitive string equivalence, without parsing.

Where is this 'fundamental rule' written down?  Nowhere.  It's  
certainly the
first I've heard of it.  Are there any other 'fundamental rules' of  
UTYPEs?

And what do you mean by 'parsing'?  If you mean reassembling a URI from
a "prefix:item" pair, prior to simple string-matching of the result, I'm
suggesting that's deserialisation, not parsing.

I don't think you can mean parsing the content of the UTYPE (to find  
selectors
and so on), since I've been very clear that I'm not suggesting that.

 > If the version number were included in a UTYPE then all the UTYPEs of
 > a data model would change every time a new version of the data model
 > is encountered.  On the other hand if the implemention deals with a
 > versioned data model, most of the UTYPEs can be expected to remain
 > the same between versions.  It is usually pretty easy to deal with
 > version changes at the level of the whole data model, as typically
 > only a few well controlled changes will occur between versions.

How often, I wonder, have I heard at interops 'no, we can't make that  
change,
because that would change the schema'.  I remember the Registry WG  
experiencing
a certain amount of angst on this topic.  I don't think everyone feels  
that
version changes at DM level are quite as trivial as you're suggesting.

 > On Thu, 2 Jul 2009, Norman Gray wrote:
 >
 > > 1: My proposal is limited to providing an answer to (1), plus some
 > > discussion of how UTypes are conceptualised.  The downsides of an  
HTTP
 > > URI are that it is longer than the UTypes defined in SSA (but  
bytes are
 > > cheap), and that it is not trivially compatible with current SSA
 > > implementations (though I have
 >
 > The issue is not just SSA of course, but all of DAL, and  
essentially all
 > of DM.  SSA, SIAV2, TAP, DAL2 arch, GDS, Characterization,  
Observation,
 > etc. etc., plus 3-5 years of standards documents and implementations.

SSA: is a standard
SIAv2: is PR
TAP: is a WD
DAL2 arch: I can't find this at http://www.ivoa.net/Documents/
GDS: Google found a mention on slide 21 of
   <http://www.ivoa.net/internal/IVOA/200905DALSessions/siapv2- 
may09.pdf>,
   nothing on http://www.ivoa.net/Documents/
Characterisation: a Standard
Observation: a 2005 Note <http://www.ivoa.net/Documents/latest/DMObs.html 
 > is
   the only thing I can find at ivoa.net, though I know there's plenty  
of
   activity on this

That's two standards, which use an apparently fundamental technology  
without
that technology being at all standardised.  The issue of the vagueness  
of
UTypes was raised, by me and others, years ago, but any concerns were  
dismissed.

In any case, I have already very clearly described how this UTypes  
proposal can
be made byte-for-byte compatible with existing standardised protocols.

 > > 3: It's important to be clear about the distinctions between  
ontologies
 > > and vocabularies.  Terms in a 'vocabulary' have rather loose  
meanings
 > > (not even necessarily as precise as Roy's 'probabilistic'), and  
have a
 > > range of use cases clustering around _searching_.  You can't do  
inferencing
 > > with them, and they're not precise enough to use for data  
access.  A
 > > 'data model' is an 'ontology'.  Data models are very important  
(and they
 > > are generally more sophisticated things than vocabularies), but I  
don't
 > > believe we have to finally settle this part of the argument yet.
 >
 > Here is how Wikipedia defines Ontology:
 >
 >    "...an ontology is a formal representation of a set of concepts
 >    within a domain and the relationships between those concepts. It
 >    is used to reason about the properties of that domain, and may
 >    be used to define the domain."
 >
 > A data model is not an ontology, it is an object model.

You've selected a very logician-friendly definition of ontology.
<http://en.wikipedia.org/wiki/Ontology_(information_science)>

The most-commonly cited one-line explanation of what an ontology is is
"a formal, explicit specification of a shared conceptualisation" (Gruber
1993 in the wikipedia page).  Depending on application and audience,
that can be taken to cover the whole spectrum from ultra-formal things
all the way down to folksonomies, but the literature increasingly takes
the useful/practical cut-off for the term 'ontology' to be some system
with "a formal is-A".  For example, imagine an OO class  
'GeometricObject',
which might have a 'paint()' method; if a class 'Circle' is declared to
extend that class, then we can deduce that it also has a 'paint()'  
method.
'Circle' in this conceptualisation is a sub-class of 'GeometricObject'.
That's a "formal is-A".  An object model is an ontology.

Not that it matters.  The text you've quoted is from an attempt
<http://www.ivoa.net/forum/semantics/0907/0936.htm> to make it clear  
that
conversations about ontologies, vocabularies, and all that are third- 
order
problems, for the future.

 > But the data model itself describes a specific class
 > of object as precisely as practical.  The goal is to be precise an
 > unambigous, not to support inference, at least not directly.  All the
 > UTYPE gives us is a concise way to refer to the attributes of a data
 > model in the abstract, independent of representation.

That sounds like a pretty good definition of an ontology.  Inference  
is a
side-issue here.

Let's just not talk about ontologies -- they're simply not an  
important part of
the current question.

 > On Thu, 2 Jul 2009, Norman Gray wrote:
 >
 > > > Lets keep UTYPEs as simple tags used to identify data model  
attributes
 > > > in actual scientific data analysis code, and use other mechanisms
 > > > for these more specialized, occasionally useful, but less  
important
 > > > capabilities.  The #1 thing here is to be able to use the data  
model
 > > > for good old fashioned scientific analysis and computation.
 > >
 > > You don't _have_ to make the URI dereferenceable.  If so, then  
it's a
 > > simple tag, which just happens to have colons and slashes in it.   
If you
 > > then change your mind, you can make it dereferenceable.  If it's  
just a
 > > dead string, however, then you're stuck -- there's no possibility  
of
 > > future expansion without inventing _another_ mechanism.
 >
 > A UTYPE is not "just a dead string".  It is a concise reference to
 > an individual attribute of a data model.  The data model however
 > is a complex entity and can have all kinds of features which we do
 > not need to encode within each individual UTYPE.

Tell me who it is who's talking about encoding things within the UTYPE  
string,
so I can stand by your side and disagree with them emphatically.

I have only ever proposed simple string comparison of UTypes.

The ONLY substantive difference between Mireille's proposal and mine is
what counts as 'the UType.  I'm suggesting that 'the UType' be such that
applications handling UTypes act AS IF the "foo:" prefix were replaced
by the namespace to give 'the UType'.  I'm not suggesting they actually
need to do this explicitly, but only act AS IF they've done it.  I hope
the proposal makes clear what benefits follow from this.

Finally:

 > It is far *more* powerful to defer these more complex semantics to  
the
 > data model itself, than to try to pick one such feature and have it
 > determine how we represent the UTYPE.  One sees this in every one of
 > the sample URIs: all we need is the context and the thing after the  
"#"
 > to uniquely define what we are dealing with, e.g., "RadioQuietAGN" or
 > "Target.Class".  The URL with all of its powerful capabilities is  
still
 > there, it is just that it is part of the namespace (object)  
reference.

If I'm understanding what you're saying here, I'm having some  
difficulty in
identifying the difference between our positions.

 > It is far *more* powerful to defer these more complex semantics to  
the
 > data model itself,

Absolutely.  The data model is where the 'meaning' is (whatever  
'meaning'
means here!), and the UTYPE is simply a name for part of it.

 > than to try to pick one such feature and have it
 > determine how we represent the UTYPE.

I don't really know what you're referring to here.

 > One sees this in every one of
 > the sample URIs: all we need is the context and the thing after the  
"#"
 > to uniquely define what we are dealing with, e.g., "RadioQuietAGN" or
 > "Target.Class".

I can imagine myself writing the same text.

The only difference I can see between our positions is that you want the
context to be implied, if I follow you correctly (eg in para 2 of
<http://www.ivoa.net/forum/dm/0906/1644.htm>), by some attribute of the
transaction that retrieved the data in question.  My problem with that,
as you know, is that I feel this implication is too diffuse, and that
there's too much danger, in practice, of the context becoming separated
from the stored data.

Thus the only contrast I can see, between the different ways we read
your text above, is that while I agree that the thing after the "#"
indicates which element of the data model we're talking about, the bit
before it compactly, completely, explicitly and uniquely names the  
context
using a namespace URI.

Best wishes,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester, UK