Schemas (and utypes)
Norman Gray
norman at astro.gla.ac.uk
Tue Jul 21 09:33:38 PDT 2009
Arnold, hello.
On 2009 Jul 2, at 16:51, Arnold Rots wrote:
> You touch on one of the central issues that have made me very
> uncomfortable with Utypes (but I assuem everyone is well aware that I
> don't like them). See below.
I've taken the liberty of adjusting the subject line here, partly (and
_very_ importantly) in order to keep this separate from the ongoing
what-is-a-utype discussion, but also because I believe your points
touch on a larger and interesting issue to do with schemas in
general. By 'schemas' here, I mean RDBMS or XML Schemas (in RDF
'schema' means something different).
>> This is presuming that ns:target.class isn't one of those utypes that
>> only makes sense when it's coordinated with a set of other utypes
>> from
>> the same model (the goal 1 of utypes, as I understand it). If it
>> makes sense by itself, then that's excellent, it means that it's been
>> artfully repurposed here, and an application can reliably/safely
>> understand this bit of XML without necessarily having heard of the
>> <whatisit> element before.
>
> This is the crux of the matter. A model never consists of a single
> item. It is usually described by a set of information items (for lack
> of a better term) that together convey the full meaning that the
> author intends to convey.
I agree with that to a pretty good approximation. However, a key
point in your remark is "the full meaning that the author intends to
convey", to which we can add "the full meaning that the reader intends/
hopes/aspires to extract", which may be very different.
> The problem with Utypes is that it allows cherry picking of
> information items with no guarantee that the information is complete,
> or even makes sense. Consistency, completeness, and uniquenness have
> been abandoned.
You say "cherry picking", I say "loose coupling". I want to argue
that utypes, like simple schemas, do indeed "[allow] cherry picking of
information items with no guarantee that the information is complete,
or even makes sense", but that this is not a practical problem.
I presume you're thinking of the consistency which the STC schema
provides, by virtue of its _syntactical_ insistence that all the
elements of a point's coordinates (for example) are included in a
message. I recall watching STC discussions on the virtues or vices of
defaulting versus explicit 'not known' remarks, and as you know I'm
aware of many of the complications of specifying astronomical
coordinate systems.
In the more-or-less loosely coupled network environment we're all
talking about, which is too complicated for one-size-fits-all rules, I
believe that this level of syntactical specification adds consistency
at the expense of adding brittleness and unnecessary complication.
That is because, ultimately, the schema doesn't add much value to the
message: if there are relevant information items missing from the
message then it is the consuming application -- and _only_ the
consuming application -- which is competent to say so, and to default,
fail, or respond appropriately to the originator. Further, a message
could pass even the most stringent syntactic validation and still be
nonsense as far as the application is concerned.
Thus schemas can act as sanity-checks and no more. They don't
realistically relieve the consuming application from any
responsibility for error-checking.[1]
What that means in turn is that the _real_ role of schemas and utypes
is a fairly modest one, concerned simply with indicating which parts
of a message are to be identified as what, at a syntactic level or not
much higher (this is the intuition behind "a pointer into a data
model").
The job of reassembling all these information items into a datamodel
instance, ontology, java-object, FITS file or whatever you want, is a
job which happens at a different layer, and it's in that layer that
appropriate cherry-picking will be accepted, and inappropriate cherry-
picking rejected, depending on the needs of the application that's
doing the reassembling. The utype model is therefore a good match to
a world of heterogeneous applications, data and uses (my suggestions
are intended to make this good match better, but the utype model is a
good one nevertheless).
Best wishes,
Norman
[1] I wouldn't go as far as to say that schemas are useless. I can
see that there are some situations where code-generation is useful,
and they can provide for contract checking ("whose fault is it that
this message couldn't be parsed?"), but they don't have the semi-
magical properties that would warrant the amount of interop agony
sustained when arguing over them.
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester, UK
More information about the semantics
mailing list