UType proposals
Norman Gray
norman at astro.gla.ac.uk
Fri Jun 26 01:19:15 PDT 2009
Doug, hello.
On 2009 Jun 18, at 23:23, Doug Tody wrote:
> This discussion is getting very long and involved so I will only
> respond to your first few points for now.
The later points were more about the use of UTYPEs in context (and I
end up recapitulating them below); these earlier ones were about
rationale and motivation.
> On Thu, 18 Jun 2009, Norman Gray wrote:
>> The problems with the current informal UTYPE usage are:
>>
>> * The UTYPEs are unversioned -- there seems no provision for v1.1
>> of a UTYPE
>
> Not so. UTYPEs are defined by a data model, and it is the data model
> which is versioned. UTYPEs have no meaning unless they are defined
> by some such data model. There is always a higher level versioned
> context whenever a UTYPE is used. The UTYPEs in SSA for example are
> defined by the specific version of the SSA protocol in use.
I think this is a key point. The corollary of this is that SSA UTYPEs
will _lose_ their meaning once they're separated from their context,
and thus if you want to store the UTYPEs outside of the context of a
SSA transaction (such as in a database, or a FITS file, or some other
format yet to be invented), then you are absolutely required to retain
the link to (the version of) the SSA transaction which initially
retrieved the object. In other words, it means that the SSA _data
model_ is closely tied to the SSA _protocol_, which will make it at
least inconvenient to reuse it in some different application.
Further, the Characterisation data model isn't even associated with a
protocol, so each application's reuse of those UTYPEs would have to
specify some additional, ad-hoc machine-readable mechanism for
associating the UTYPEs with a document version. If some application
then wanted to use two different versions of the Char'n UTYPEs at once
(for example to indicate for compatibility reasons that a particular
value was a Char'n v1 Position _and_ a Char'n v2 Position), then it
would have to do that disambiguation in some application-specific way.
I think it's quite generally true to say that making things context-
dependent usually makes them brittle.
>> * There is no underlying model for UTYPEs, beyond the vague
>> assertion that they 'point into a data model'. The current UTYPE
>> documents go into some detail about the punctuation within a UTYPE,
>> but don't even approach such basic questions as 'is this a property
>> or a type?' This means that things like the composite UTYPEs of
>> Mireille's draft (the ones with the semicolon, which I believe are
>> eminently defensible) are introduced without any framework for a
>> discussion of what is actually going on here. Without some such
>> framework, there is nothing ahead but muddle.
>
> I agree that we do have some finer points regarding UTYPE usage
> to resolve, but in general it is up to the context (data model or
> whatever) which uses the UTYPEs to define their meaning.
It is certainly true that it is a data model's responsibility to
define the meaning of an individual UTYPE, but there is a conceptual
layer below that -- roughly, what _is_ a UTYPE -- which there must
surely be some VO-wide consensus on, and which I cannot see as merely
fine detail.
> We have
> fairly well developed concepts for how to map a set of classes
> to UTYPEs for example; SSA/SpectrumGDS, Char, SimDB, etc. do this
> already and it is pretty straightforward.
I am not suggesting otherwise. Clearly one can define a
transformation from UML (for example, in the case of Char'n) to
strings, and I recall quickly developing with Gerard an XSLT script to
do this mechanically for XMI files. However if there is some explicit
clarity about what one is doing in this process, then the resulting
UTYPEs will be useful in contexts beyond those we've already thought of.
>> Although they should of course be informed by implementations,
>> standards do not exist merely to 'document current practice'.
>
> It is not just a matter of implementations. We have a number of
> standards already in use which define UTYPEs. While we might refine
> the concept and usage of a UTYPE, we cannot afford to invalidate
> existing VO standards and their implementations (aka "current
> practice"), unless their is a very compelling reason to do so.
The point of this current exercise (I take it) is to retrospectively
clarify what UTYPEs are.
I should perhaps re-emphasise again that I am not suggesting
invalidating anything.
With care, a precise definition of the notion of UTYPEs could be made
in such a way that the existing UTYPE lists in existing standards were
compatible with a new UTYPE standard without a single byte's change.
For example, if it were retrospectively declared that any application
processing SSA v1.04 protocol files (in XML) should act as if the
namespace xmlns:ssa='blah...' were defined, and therefore act _as if_
this namespace were concatenated to the front of the UTYPEs in the
protocol, then each and every application currently processing SSA
protocol UTYPEs would be compatible with what I am suggesting without
a single line of code being changed.
>> I take it that a UTYPE standard is intended to be useful for the
>> next two to four decades of developments on the web, and larger,
>> more intricate, and more heterogeneous datasets in astronomy. ...
>
> It is a mistake to assume that we should be Web-centric about the
> facilities we provide for processing astronomical data and
> manipulating
> science data models. Data models are abstractions which are
> technology
> neutral. In general processing modules in science software should not
> know anything about how they are being used, and we often want such
> software to survive many years of evolution of external infrastructure
> technology such as the Web.
I'm certainly with you on the notion that we should be designing for
decades' worth of forward compatibility. However, I don't see that an
ascii-byte string starting 'ssa:' is so much less of a hostage to the
future than an ascii-byte string starting 'http:'. As an extreme
example, in a couple of decades bytes may be rendered obsolete by
quantum-bytes (let's get science fiction), but that doesn't matter
because there will definitely be a migration path from ... erm ...
newtonian-bytes to quantum-bytes. Similarly, while it's not going to
last for a millennium, the web really _isn't_ a passing fad, and since
the information infrastructure of the whole planet -- not just
astronomy -- depends on it, we can be as certain as we can of anything
in technology, that there will be a smooth and standardised migration
path to whatever comes after it.
> We also have plenty of cases where we want
> such software to be able to function without an Internet connection.
> We have many other similar cases like this, e.g., UCD and Unit are
> also not Web-centric in any way.
In the proposal I've described, I emphasised that there is no need for
there to be any internet connection when a UTYPE is being used -- it's
just an opaque string at that point. In the proposal I suggested
requiring that UTYPEs were dereferenceable, but even if there were not
even an attempt to make the UTYPE dereferenceable, the model I've
described has the virtue of explicitness, plus the obvious and
immediate namespacing benefits deriving from use of the DNS. If the
UTYPE is dereferenceable, and there is a network connection, and the
application or the software build process feels like it, then there is
some extra information available by dereferencing the UTYPE, but even
these contingent benefits are forever closed off to a UTYPE which is a
'dead string' by design.
And I'm not at all interested in UCDs and Units (for this present
discussion) -- they're very different things from UTYPEs.
I hope these observations are helpful.
Best wishes,
Norman
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester
More information about the dm
mailing list