UTYPEs and UFIs

Sat Sep 22 05:24:10 PDT 2007

Jonathan, hello.

On 2007 Sep 18, at 13:38, Jonathan McDowell wrote:

>  I have prepared a note on my view of the role and syntax of UTYPEs

I think this is an excellent plan -- it feels as if UTYPEs (or  
utypes, or UTypes, ...) have long been the magic spell that's going  
to solve the VO's modelling problems, always _next_ year, so some  
explicitness is probably overdue.

I prepared a Note proposing a UType syntax earlier this year <http:// 
www.ivoa.net/Documents/latest/utype-uri.html>.  That Note proposed a  
UType syntax as well as discussing some of the benefits this would  
bring, so it may have tried to do too many things at once.

Here are a few comments on your UType proposal.

Section 1.2, Namespacing

It seems impossible for there _not_ to be namespacing support, since  
without namespacing, we can have no versioning, profiling or  
extension.  Omitting namespacing requires that the first version of a  
data model be perfect, and that astronomy will not change thereafter.

Yes, namespaces should be URLs (and dereferenceable).  XML is the  
well-known example of this, but the notion is much more general.

Section 1.1, syntax

Although in section 1.6 you make the point that syntax and semantics  
are separate things (and I heartily agree with you), the discussion  
in section 1.1 seems to me to crush the two things together.

Defining the UType as a structured string implies, and indeed  
imposes, a hierarchical structure on the UTypes.  The potential for  
cropping and the case-insensitivity mean that an application has to  
do some normalisation before two UTypes can be compared.  The fact  
that a UType has any structure at all means that applications will be  
obliged to parse it to some extent, which means they'll get it wrong  
sometimes, and who's to clear up the mess?  The parsing is not  
complicated, of course, but it's more trouble than just using the  
string as-is.  This section doesn't discuss why these extra costs are  
necessary -- it's not as if users will routinely be typing these  
things in (surely).

I propose that a UType be, in principle, simply a URI with fragment.   
The part before the hash acts as the namespace (and when dereferenced  
could give human- or machine-readable documentation for the namespace  
in question), and the opaque sequence of characters after it is the  
within-namespace part of it.

You never know: you might be able to get the UType spec onto a single  
page!

How that URI appears in a document would depend on the  
serialisation.  In the case of any XML document it could use the XML  
namespace mechanisms:

     <element xmlns:xx='http://example.edu/myns/1.0#'>
       <subelem utype='xx:utype'/>
     </element>

VOTable might have a different mechanism, and the same UType might  
appear in FITS as:

     TUTNS4='http://example.edu/myns/1.0#'
     TUTYP4='utype'

That is, although the UType is a URI for specification purposes, it  
need never appear as such in a real VO document.  This means the  
namespacing mechanism barely needs specifying at all (less syntax to  
specify and get wrong), and the required processing can be handled by  
any language which can do string concatenation (which even includes  
Fortran).

I'm not suggesting that the within-namespace UType be a completely  
opaque blob of characters.  It would be wise for a UType spec to give  
some very firm guidance about the format of UType strings -- for  
example indicating that they should reflect any hierarchical  
structure within the data model.  This make it easier for DM  
maintainers and application authors to manage or generate them.  But  
I see no need for DM authors to be second-guessed by having the  
syntax mandated in advance in the UType spec.

1.3 and 1.4, combining models and introducing UFIs

I feel that the notational complexity of these two sections comes  
about from defining syntax and semantics at the same time.

The two examples in these sections are 'the thing which is the  
Resolution.PosAngle.Value of a RedshiftFrame.CustomRefPos.Coordinate'  
and 'the Char.CharacterizationAxis which has UCD X'.   These two  
examples are instances of the same problem: how do you take a  
complicated idea and turn it into a sequence of characters?  Sections  
1.3 and 1.4 are two separate ad-hoc solutions, each of which is about  
as complicated as it could get, and both of which present separate  
parsing challenges.

A bold solution to this problem is to straightforwardly rewrite the  
two quoted sentences above in formal terms (yes, in RDF, that's what  
it's for), which can be as expressive and as extensible as we will  
ever need, and then as a conceptually separate step, discuss how to  
serialise that in an XML, VOTable, or FITS file.  The first step is  
well understood, and the second problem is already solved.

OK: that would make the spec two pages long, but who's counting?

Best wishes (see you next week!),

Norman

-- 
------------------------------------------------------------
Norman Gray  :  http://nxg.me.uk
eurovotech.org  :  University of Leicester, UK