content, format, ctype, or xtype ?
Norman Gray
norman at astro.gla.ac.uk
Tue May 12 08:32:55 PDT 2009
Mark, hello.
On 2009 May 12, at 13:14, Mark Taylor wrote:
>> I didn't see the beginning of this thread (and the archive at
>> <http://www.ivoa.net/forum/dal/> only goes up to 21 April), but if
>> utypes
>> (already sitting above UCDs and still not even defined) are
>> sufficiently
>> incomplete that we require 'ctype' and 'xtype', too,
>
> nobody is suggesting both ctype and xtype, this thread started by
> attempting to come up with a name for a single new item.
Fair enough. The archive seems to be cropped (I've mailed ESO), so I
couldn't see where the thread had been, and I can't now go back and
see the original proposal.
> Two columns could have the same utype (indicating an observation time)
> but different [content/format/xtype/ctype/whatever-it's-called];
> one could be supplied as an ISO-8601 string, and another as an MJD.
Very true, and you make a similar point in the exchange with Paul,
discussing the units attribute.
These attributes 'units', 'datatype' and 'ctype/xtype/whatever' are
clearly lexical matters, or something nearby, and -- I agree --
orthogonal to utypes.
It does, however, seem to me that if there are now three attributes
potentially describing this, then this is getting complicated enough
that it suggests that something is being missed.
The XSchema-2 document <http://www.w3.org/TR/xmlschema-2/> is a bit
dense, but is clearly the result of some rather careful thought on
this exact same question. It describes datatypes as comprising a
'value space', a 'lexical space' and a 'facet' (which isn't I think
relevant to this discussion) (section 2). The 'value space' is the
set of values that a datatype can take, so that for the type
'integer', it's abstract mathematical numbers in Z. The 'lexical
type' is the set of valid literals for an object, so that for
'integer' it's sequences of [0-9].
So far, so simple. The XSchema type 'dateTime' has value space
consisting of the set of instants in UTC, and its lexical space is
ISO-8601 strings. That makes it clear that timescale issues are a
matter of the 'value space' (in this terminology) and yyyy-mm-dd-etc
is a matter of the mapping between 'value space' and 'lexical space'.
Thus, what the XSchema type of an element is doing, in a schema-
validated XML file, is indicating _both_ of these things. They
therefore don't seem as separate as they actually are.
Also, this discussion makes it clear -- because XML doesn't concern
itself with semantics at all -- that these two issues are syntactic
ones.
Thus, if I understand the issue correctly, as it relates to date/time
types for example, the problem is:
1. we _may_ want to have several different 'value spaces' for time,
corresponding to different timescales (though this in the end may not
be the most suitable place to indicate this); and
2. we definitely want to indicate several different 'lexical
spaces' for time, corresponding for example to MJD vs ISO-8601.
The same separation is identifiable with STC, where a set of mappings
to R^n is the 'value space' and STC-S or STC-X is the 'lexical space'.
Because the XSchema spec identifies precisely one 'lexical space' for
each 'value space', once you have indicated one in that context you
have indicated the other as well, and so it doesn't require separate
notation for the two things. This is the model which the VOTable spec
appears to be inspired by, in its section 2.1. However we (appear to)
want multiple 'lexical spaces' for several of our 'value spaces', and
so this discussion is about the 'missing notation' of how to indicate
the 'lexical space' for the 'value space' indicated by the datatype.
Given that, is the solution perhaps to extend the set of datatypes
listed in the VOTable (or analogous) spec, for example to add time and
spatial location, and possibly extend that further to indicate the
mapping to 'lexical space'. Thus datatype='time_ISO-8601' would seem
to do the job, with the datatype attribute being essentially 'value-
space_optional-lexical-space'. That seems fairly clear, even
intuitive, and is principled enough that I think we can be fairly sure
that we're not missing something else.
I imagine that units could be incorporated into this as well, if there
were interest in that. How about datatype='angle/deg'?
Best wishes,
Norman
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester
More information about the dal
mailing list