UType proposals

Thu Jun 18 14:11:34 PDT 2009

Doug, hello.

On 2009 Jun 15, at 05:42, Douglas Tody wrote:

> On Thu, 11 Jun 2009, Norman Gray wrote:
>
>>  http://nxg.me.uk/note/2009/utype-proposals/
>>
>> This document is not intended to be a counterproposal. I believe it  
>> is at heart the same proposal as Mireille's, but arrived at from a  
>> rather different direction, and so justified in a different way.  
>> The principal syntactic difference is that the model-name Utype  
>> elements are here references to a namespace, rather than regarded  
>> as the namespace themselves.
>
> It seems to me we are covering ground again here that we have been
> over before.  I do not see sufficient justification for trying to
> morph UTYPEs into URLs (or XPaths etc.), certainly nothing sufficient
> to change current use of UTYPEs drastically.  We have been using
> UTYPEs in IVOA interfaces and implementations quite successfully for
> several years now.  While there are details remaining to be specified
> and minor tweaks are possible, no compelling case has been made for
> a major departure from current UTYPE usage.

The problems with the current informal UTYPE usage are:

   * The UTYPEs are unversioned -- there seems no provision for v1.1  
of a UTYPE

   * The UTYPEs are 'dead' strings -- they don't lead to documentation  
or further machine-readable information.  We live in a networked  
world, and it seems perverse to ignore that.

   * There is no underlying model for UTYPEs, beyond the vague  
assertion that they 'point into a data model'.  The current UTYPE  
documents go into some detail about the punctuation within a UTYPE,  
but don't even approach such basic questions as 'is this a property or  
a type?'  This means that things like the composite UTYPEs of  
Mireille's draft (the ones with the semicolon, which I believe are  
eminently defensible) are introduced without any framework for a  
discussion of what is actually going on here.  Without some such  
framework, there is nothing ahead but muddle.

I can't speak for anyone else, but each one of these problems is  
compelling to me at least.

> Mireille's draft on the
> other hand is already very close to both documenting current practice
> while clarifying the remaining details.

Although they should of course be informed by implementations,  
standards do not exist merely to 'document current practice'.

I take it that a UTYPE standard is intended to be useful for the next  
two to four decades of developments on the web, and larger, more  
intricate, and more heterogeneous datasets in astronomy.  A standard  
which merely attempts to document the last few years of SSA  
implementations is _not_ practical for the future.

> The key concept which I think is wrong here is the desire to be able
> to have each UTYPE be a self-contained, separable object (which is  
> what
> the URL representation provides).

Who is it proposing this? (do you mean object as in OOP?).  If you  
mean me, I think I must have explained myself very badly indeed.

> This is just not needed in real use
> cases as UTYPEs are only used to tag the individual properties of a
> more complex object.  There are multiple such properties, each with
> its own UTYPE tag, for any such object, at least in any real world
> use case.  We do not use such object properties (UTYPEs) as separate
> stand-alone objects, rather we use the object these object properties
> collectively refer to.  In normal usage multiple such object  
> properties
> (UTYPEs) will be needed to represent, understand and use the object.

Indeed.  I believe that's exactly what I'm proposing.

As I intended to emphasise, my 'proposal' is in most respects  
identical to Mireille's.  There are only two differences.  Firstly,  
I'm aiming to describe what appears to be the underlying model for  
UTYPEs, which therefore provides a rationale for them, and answers  
questions such as 'what are UTYPEs?' and 'what is the equality  
function for UTYPEs?'

Secondly, I'm suggesting that in a case such as

    <VOTABLE xmlns:simdb='http://www.ivoa.net/dm/simdb/v1.0#'>
      ...
      <param id='foo' utype='simdb:Simulated.Foo'>
        xxx
      </param>
    </VOTABLE>

an application should act _as if_ the utype were http://www.ivoa.net/dm/simdb/v1.0#Simulated.Foo 
.  There's nothing there about displaying that complete URL to the  
user, or implying some object-oriented approach.  Nothing more than  
that; and this is effectively identical (apart from minor syntactic  
considerations and the underlying rationale) to Mireille's proposal.

> Another issue is that UTYPEs are not merely hidden metadata that
> no one ever needs to look at.  Rather they are a primary part of
> the (technical) user interface of the software and protocols we
> use for access to data and other objects.  A client application
> for example would typically manipulate data models using UTYPEs
> (or their context-specific aliases) to access the attributes of an
> object instance.  It is the *serialization* of the object (be it
> VOTable, FITS, a parameter set, etc.)  that we want to hide from
> the developer writing code to manipulate some object.

I think we're on the same page here.

This is the notion -- am I correct? -- that it should be possible to  
describe the state of some instance of a data model (which you're  
calling an 'object' here, and which might well be an object in the OOP  
sense) using a set of key-value pairs, where the keys are UTYPEs and  
the values are literals such as strings or numbers.  If so, then we're  
definitely talking about the same thing, since that was goal 1 in my  
'proposal' document.

> The UTYPE is
> the primary construct providing representation-independent access
> to the semantic content of an object instance, and is visible to
> the developer.  Hence we do care what it looks like.

Yes, and I was careful to note that "UTypes should be reasonably  
readable by a developer."

I'm afraid I just don't see how <http://www.ivoa.net/dm/simdb/v1.0#Simulated.Foo 
 > is significantly less readable than "simdb:Simulated.Foo",  
especially since the way that that would generally appear in a  
VOTable, say, would be as "simdb:Simulated.Foo".

Presumably you're not suggesting that something like  
"ssa:Char.TimeAxis.Coverage.Location.Value" (randomly picked from  
Mireille's document) would appear in a user-facing UI.  You mention  
'context-specific aliases', but where is this alias to come from?  How  
is it to be associated with the long UTYPE?  What are the bounds of  
the 'context'?  How is the 'context' named, described or retrieved?

If you mean 'display label' or something like it, then I have  
described a principled, already-standardised and immediately  
_practical_ mechanism for describing that and whatever else needs to  
be associated with the UTYPE now and in the next couple of decades.   
That can be deployed _today_.  This wheel does not need to be re- 
invented.

> While there might be some use in being able to look up some HTML for
> an individual UTYPE, it is much more important to be able to look up
> the documentation for the data model, since in general this is what
> we want to understand.  In general it is not that useful to look only
> at an individual object property.  Once we can look up a referenced,
> versioned data model there will be many ways we can get documentation
> for individual data model attributes, each with their UTYPE tag.

Yes, and in what I was describing there would be only one HTML  
document -- the IVOA REC for a data model.  The only extra structure  
I'm suggesting in that is that each of the UTYPEs documented in there  
is described within an HTML <a name='Simulated.Foo'> element.  You put  
the UTYPE into your browser and you get the original authoritative  
documentation from the appropriate subsection of the REC; when you  
want the context and the overall data model description, you just read  
the rest of that same document.

> It could be easy for example to auto-generate a URL for an individual
> UTYPE given the UTYPE and the URL of the data model.

Then why not simply require that applications act _as if_ that URL  
were the name of the UTYPE?

> 2) the namespace reference and the
> individual object properties should be specified separately so that
> we do not duplicate the class reference in each object property
> (UTYPE), which aside from being unnecessarily verbose would make
> it much more difficult to ensure object integrity.

I don't see how

    <VOTABLE xmlns:simdb='http://www.ivoa.net/dm/simdb/v1.0#'>
      ...
      <param id='foo' utype='simdb:Simulated.Foo'>xxx</param>
      <param utype='simdb.Simulated.Bar'>yyy</param>
      ....
    </VOTABLE>

...is unnecessarily verbose or undermines object integrity.  An  
application is supposed to interpret this as a set of key-value pairs  
which allow it to reconstruct an instance of a data model object (this  
is correct, isn't it?).  That is, I believe this should imply a table  
in memory which will be the equivalent of

     http://www.ivoa.net/dm/simdb/v1.0#Simulated.Foo    xxx
     http://www.ivoa.net/dm/simdb/v1.0#Simulated.Bar    yyy
     ...

If this is too verbose, and an application wants to manage this table  
differently, in order to save bytes of memory or something, then that  
would be fine, as long as its effect is equivalent to this.  I think  
I'm missing the point at which this verbosity is a problem.

If I have explained this 'proposal' poorly, I'm sorry, and would  
welcome suggestions for improvement.  I emphasise that I believe the  
differences from Mireille's proposal are significant but slight, and  
make UTYPEs ready for the future.

Best wishes,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester