UType proposals

Fri Jun 26 01:19:15 PDT 2009

Doug, hello.

On 2009 Jun 18, at 23:23, Doug Tody wrote:

> This discussion is getting very long and involved so I will only
> respond to your first few points for now.

The later points were more about the use of UTYPEs in context (and I  
end up recapitulating them below); these earlier ones were about  
rationale and motivation.

> On Thu, 18 Jun 2009, Norman Gray wrote:
>> The problems with the current informal UTYPE usage are:
>>
>> * The UTYPEs are unversioned -- there seems no provision for v1.1  
>> of a UTYPE
>
> Not so.  UTYPEs are defined by a data model, and it is the data model
> which is versioned.  UTYPEs have no meaning unless they are defined
> by some such data model.  There is always a higher level versioned
> context whenever a UTYPE is used.  The UTYPEs in SSA for example are
> defined by the specific version of the SSA protocol in use.

I think this is a key point.  The corollary of this is that SSA UTYPEs  
will _lose_ their meaning once they're separated from their context,  
and thus if you want to store the UTYPEs outside of the context of a  
SSA transaction (such as in a database, or a FITS file, or some other  
format yet to be invented), then you are absolutely required to retain  
the link to (the version of) the SSA transaction which initially  
retrieved the object.  In other words, it means that the SSA _data  
model_ is closely tied to the SSA _protocol_, which will make it at  
least inconvenient to reuse it in some different application.

Further, the Characterisation data model isn't even associated with a  
protocol, so each application's reuse of those UTYPEs would have to  
specify some additional, ad-hoc machine-readable mechanism for  
associating the UTYPEs with a document version.  If some application  
then wanted to use two different versions of the Char'n UTYPEs at once  
(for example to indicate for compatibility reasons that a particular  
value was a Char'n v1 Position _and_ a Char'n v2 Position), then it  
would have to do that disambiguation in some application-specific way.

I think it's quite generally true to say that making things context- 
dependent usually makes them brittle.

>> * There is no underlying model for UTYPEs, beyond the vague  
>> assertion that they 'point into a data model'.  The current UTYPE  
>> documents go into some detail about the punctuation within a UTYPE,  
>> but don't even approach such basic questions as 'is this a property  
>> or a type?'  This means that things like the composite UTYPEs of  
>> Mireille's draft (the ones with the semicolon, which I believe are  
>> eminently defensible) are introduced without any framework for a  
>> discussion of what is actually going on here.  Without some such  
>> framework, there is nothing ahead but muddle.
>
> I agree that we do have some finer points regarding UTYPE usage
> to resolve, but in general it is up to the context (data model or
> whatever) which uses the UTYPEs to define their meaning.

It is certainly true that it is a data model's responsibility to  
define the meaning of an individual UTYPE, but there is a conceptual  
layer below that -- roughly, what _is_ a UTYPE -- which there must  
surely be some VO-wide consensus on, and which I cannot see as merely  
fine detail.

> We have
> fairly well developed concepts for how to map a set of classes
> to UTYPEs for example; SSA/SpectrumGDS, Char, SimDB, etc. do this
> already and it is pretty straightforward.

I am not suggesting otherwise.  Clearly one can define a  
transformation from UML (for example, in the case of Char'n) to  
strings, and I recall quickly developing with Gerard an XSLT script to  
do this mechanically for XMI files. However if there is some explicit  
clarity about what one is doing in this process, then the resulting  
UTYPEs will be useful in contexts beyond those we've already thought of.

>> Although they should of course be informed by implementations,  
>> standards do not exist merely to 'document current practice'.
>
> It is not just a matter of implementations.  We have a number of
> standards already in use which define UTYPEs.  While we might refine
> the concept and usage of a UTYPE, we cannot afford to invalidate
> existing VO standards and their implementations (aka "current
> practice"), unless their is a very compelling reason to do so.

The point of this current exercise (I take it) is to retrospectively  
clarify what UTYPEs are.

I should perhaps re-emphasise again that I am not suggesting  
invalidating anything.

With care, a precise definition of the notion of UTYPEs could be made  
in such a way that the existing UTYPE lists in existing standards were  
compatible with a new UTYPE standard without a single byte's change.   
For example, if it were retrospectively declared that any application  
processing SSA v1.04 protocol files (in XML) should act as if the  
namespace xmlns:ssa='blah...' were defined, and therefore act _as if_  
this namespace were concatenated to the front of the UTYPEs in the  
protocol, then each and every application currently processing SSA  
protocol UTYPEs would be compatible with what I am suggesting without  
a single line of code being changed.

>> I take it that a UTYPE standard is intended to be useful for the  
>> next two to four decades of developments on the web, and larger,  
>> more intricate, and more heterogeneous datasets in astronomy. ...
>
> It is a mistake to assume that we should be Web-centric about the
> facilities we provide for processing astronomical data and  
> manipulating
> science data models.  Data models are abstractions which are  
> technology
> neutral.  In general processing modules in science software should not
> know anything about how they are being used, and we often want such
> software to survive many years of evolution of external infrastructure
> technology such as the Web.

I'm certainly with you on the notion that we should be designing for  
decades' worth of forward compatibility.  However, I don't see that an  
ascii-byte string starting 'ssa:' is so much less of a hostage to the  
future than an ascii-byte string starting 'http:'.  As an extreme  
example, in a couple of decades bytes may be rendered obsolete by  
quantum-bytes (let's get science fiction), but that doesn't matter  
because there will definitely be a migration path from ... erm ...  
newtonian-bytes to quantum-bytes.  Similarly, while it's not going to  
last for a millennium, the web really _isn't_ a passing fad, and since  
the information infrastructure of the whole planet -- not just  
astronomy -- depends on it, we can be as certain as we can of anything  
in technology, that there will be a smooth and standardised migration  
path to whatever comes after it.

> We also have plenty of cases where we want
> such software to be able to function without an Internet connection.
> We have many other similar cases like this, e.g., UCD and Unit are
> also not Web-centric in any way.

In the proposal I've described, I emphasised that there is no need for  
there to be any internet connection when a UTYPE is being used -- it's  
just an opaque string at that point.  In the proposal I suggested  
requiring that UTYPEs were dereferenceable, but even if there were not  
even an attempt to make the UTYPE dereferenceable, the model I've  
described has the virtue of explicitness, plus the obvious and  
immediate namespacing benefits deriving from use of the DNS.  If the  
UTYPE is dereferenceable, and there is a network connection, and the  
application or the software build process feels like it, then there is  
some extra information available by dereferencing the UTYPE, but even  
these contingent benefits are forever closed off to a UTYPE which is a  
'dead string' by design.

And I'm not at all interested in UCDs and Units (for this present  
discussion) -- they're very different things from UTYPEs.

I hope these observations are helpful.

Best wishes,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester