content, format, ctype, or xtype ?

Norman Gray norman at astro.gla.ac.uk
Tue May 12 08:32:55 PDT 2009


Mark, hello.

On 2009 May 12, at 13:14, Mark Taylor wrote:

>> I didn't see the beginning of this thread (and the archive at
>> <http://www.ivoa.net/forum/dal/> only goes up to 21 April), but if  
>> utypes
>> (already sitting above UCDs and still not even defined) are  
>> sufficiently
>> incomplete that we require 'ctype' and 'xtype', too,
>
> nobody is suggesting both ctype and xtype, this thread started by
> attempting to come up with a name for a single new item.

Fair enough.  The archive seems to be cropped (I've mailed ESO), so I  
couldn't see where the thread had been, and I can't now go back and  
see the original proposal.

> Two columns could have the same utype (indicating an observation time)
> but different [content/format/xtype/ctype/whatever-it's-called];
> one could be supplied as an ISO-8601 string, and another as an MJD.

Very true, and you make a similar point in the exchange with Paul,  
discussing the units attribute.

These attributes 'units', 'datatype' and 'ctype/xtype/whatever' are  
clearly lexical matters, or something nearby, and -- I agree --  
orthogonal to utypes.

It does, however, seem to me that if there are now three attributes  
potentially describing this, then this is getting complicated enough  
that it suggests that something is being missed.



The XSchema-2 document <http://www.w3.org/TR/xmlschema-2/> is a bit  
dense, but is clearly the result of some rather careful thought on  
this exact same question.  It describes datatypes as comprising a  
'value space', a 'lexical space' and a 'facet' (which isn't I think  
relevant to this discussion) (section 2).  The 'value space' is the  
set of values that a datatype can take, so that for the type  
'integer', it's abstract mathematical numbers in Z.  The 'lexical  
type' is the set of valid literals for an object, so that for  
'integer' it's sequences of [0-9].

So far, so simple.  The XSchema type 'dateTime' has value space  
consisting of the set of instants in UTC, and its lexical space is  
ISO-8601 strings.  That makes it clear that timescale issues are a  
matter of the 'value space' (in this terminology) and yyyy-mm-dd-etc  
is a matter of the mapping between 'value space' and 'lexical space'.   
Thus, what the XSchema type of an element is doing, in a schema- 
validated XML file, is indicating _both_ of these things.  They  
therefore don't seem as separate as they actually are.

Also, this discussion makes it clear -- because XML doesn't concern  
itself with semantics at all -- that these two issues are syntactic  
ones.

Thus, if I understand the issue correctly, as it relates to date/time  
types for example, the problem is:

   1. we _may_ want to have several different 'value spaces' for time,  
corresponding to different timescales (though this in the end may not  
be the most suitable place to indicate this); and

   2. we definitely want to indicate several different 'lexical  
spaces' for time, corresponding for example to MJD vs ISO-8601.

The same separation is identifiable with STC, where a set of mappings  
to R^n is the 'value space' and STC-S or STC-X is the 'lexical space'.

Because the XSchema spec identifies precisely one 'lexical space' for  
each 'value space', once you have indicated one in that context you  
have indicated the other as well, and so it doesn't require separate  
notation for the two things.  This is the model which the VOTable spec  
appears to be inspired by, in its section 2.1.  However we (appear to)  
want multiple 'lexical spaces' for several of our 'value spaces', and  
so this discussion is about the 'missing notation' of how to indicate  
the 'lexical space' for the 'value space' indicated by the datatype.

Given that, is the solution perhaps to extend the set of datatypes  
listed in the VOTable (or analogous) spec, for example to add time and  
spatial location, and possibly extend that further to indicate the  
mapping to 'lexical space'.  Thus datatype='time_ISO-8601' would seem  
to do the job, with the datatype attribute being essentially 'value- 
space_optional-lexical-space'.  That seems fairly clear, even  
intuitive, and is principled enough that I think we can be fairly sure  
that we're not missing something else.

I imagine that units could be incorporated into this as well, if there  
were interest in that.  How about datatype='angle/deg'?

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester



More information about the dal mailing list