UTYPEs and UFIs
Norman Gray
norman at astro.gla.ac.uk
Sat Sep 22 05:24:10 PDT 2007
Jonathan, hello.
On 2007 Sep 18, at 13:38, Jonathan McDowell wrote:
> I have prepared a note on my view of the role and syntax of UTYPEs
I think this is an excellent plan -- it feels as if UTYPEs (or
utypes, or UTypes, ...) have long been the magic spell that's going
to solve the VO's modelling problems, always _next_ year, so some
explicitness is probably overdue.
I prepared a Note proposing a UType syntax earlier this year <http://
www.ivoa.net/Documents/latest/utype-uri.html>. That Note proposed a
UType syntax as well as discussing some of the benefits this would
bring, so it may have tried to do too many things at once.
Here are a few comments on your UType proposal.
Section 1.2, Namespacing
It seems impossible for there _not_ to be namespacing support, since
without namespacing, we can have no versioning, profiling or
extension. Omitting namespacing requires that the first version of a
data model be perfect, and that astronomy will not change thereafter.
Yes, namespaces should be URLs (and dereferenceable). XML is the
well-known example of this, but the notion is much more general.
Section 1.1, syntax
Although in section 1.6 you make the point that syntax and semantics
are separate things (and I heartily agree with you), the discussion
in section 1.1 seems to me to crush the two things together.
Defining the UType as a structured string implies, and indeed
imposes, a hierarchical structure on the UTypes. The potential for
cropping and the case-insensitivity mean that an application has to
do some normalisation before two UTypes can be compared. The fact
that a UType has any structure at all means that applications will be
obliged to parse it to some extent, which means they'll get it wrong
sometimes, and who's to clear up the mess? The parsing is not
complicated, of course, but it's more trouble than just using the
string as-is. This section doesn't discuss why these extra costs are
necessary -- it's not as if users will routinely be typing these
things in (surely).
I propose that a UType be, in principle, simply a URI with fragment.
The part before the hash acts as the namespace (and when dereferenced
could give human- or machine-readable documentation for the namespace
in question), and the opaque sequence of characters after it is the
within-namespace part of it.
You never know: you might be able to get the UType spec onto a single
page!
How that URI appears in a document would depend on the
serialisation. In the case of any XML document it could use the XML
namespace mechanisms:
<element xmlns:xx='http://example.edu/myns/1.0#'>
<subelem utype='xx:utype'/>
</element>
VOTable might have a different mechanism, and the same UType might
appear in FITS as:
TUTNS4='http://example.edu/myns/1.0#'
TUTYP4='utype'
That is, although the UType is a URI for specification purposes, it
need never appear as such in a real VO document. This means the
namespacing mechanism barely needs specifying at all (less syntax to
specify and get wrong), and the required processing can be handled by
any language which can do string concatenation (which even includes
Fortran).
I'm not suggesting that the within-namespace UType be a completely
opaque blob of characters. It would be wise for a UType spec to give
some very firm guidance about the format of UType strings -- for
example indicating that they should reflect any hierarchical
structure within the data model. This make it easier for DM
maintainers and application authors to manage or generate them. But
I see no need for DM authors to be second-guessed by having the
syntax mandated in advance in the UType spec.
1.3 and 1.4, combining models and introducing UFIs
I feel that the notational complexity of these two sections comes
about from defining syntax and semantics at the same time.
The two examples in these sections are 'the thing which is the
Resolution.PosAngle.Value of a RedshiftFrame.CustomRefPos.Coordinate'
and 'the Char.CharacterizationAxis which has UCD X'. These two
examples are instances of the same problem: how do you take a
complicated idea and turn it into a sequence of characters? Sections
1.3 and 1.4 are two separate ad-hoc solutions, each of which is about
as complicated as it could get, and both of which present separate
parsing challenges.
A bold solution to this problem is to straightforwardly rewrite the
two quoted sentences above in formal terms (yes, in RDF, that's what
it's for), which can be as expressive and as extensible as we will
ever need, and then as a conceptually separate step, discuss how to
serialise that in an XML, VOTable, or FITS file. The first step is
well understood, and the second problem is already solved.
OK: that would make the spec two pages long, but who's counting?
Best wishes (see you next week!),
Norman
--
------------------------------------------------------------
Norman Gray : http://nxg.me.uk
eurovotech.org : University of Leicester, UK
More information about the dm
mailing list