A suggested revision for UCDs
Norman Gray
norman at astro.gla.ac.uk
Wed Oct 22 10:43:48 PDT 2003
Greetings, all, and Tom in particular.
On Tue, 21 Oct 2003, Thomas McGlynn wrote:
>
> A few minutes ago I uploaded a version of my suggested revised
> proposal for UCDs to the Twiki. This is just a Word version since
> I don't have a PDF generator handy. The URL is
> http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9b.doc
I've appended a (longish) set of comments below. I've just noticed that
Bob has forwarded a long set of comments to the list. I haven't read
those yet.
By the way, I notice that this announcement/discussion has been posted
to no fewer than _three_ lists, namely ucd, dm and dal. It would be
at the very least neater if it were on only one -- ucd at ivoa.net is the
obvious one. What do folk think -- are there folk on the other two
lists who have an interest in this and aren't on the ucd at ivoa.net
list?
I'm sure I'm not the only one to find Tom's proposal very
thought-provoking. The suggestions bring up several new use-cases;
and the idea of the `local' atom in particular is valuable, and a
gap in the 1.9.9 proposals (though I'd put it in a different place).
I think there are very likely several places in the 1.9.9 proposals
which are underspecified, and some where I personally would probably
explain things slightly differently from Roy and Sebastien, but these
are editorial matters.
I have a few difficulties with some aspects of Tom's proposal, however,
which I'll discuss here, and add a few more general remarks at the end.
I'm speaking for myself of course, rather than the group of authors,
and thus it's probable that my opinion and interpretation of some 1.9.9
points is at variance with others in the group, or goes beyond what the
document aims to say (which would be a useful datapoint).
Most urgent, I think, is Tom's discussion, in his section 4.5, of the
distinction between his proposals and the 1.9.9 ones. These are crucial,
since these criticisms are what would ultimately justify replacing the
1.9.9 proposals with Tom's more complicated ones.
In the 1.9.9 proposals, the function of a word is always the same:
some things such as `src' are concepts (and only concepts), and
every other word names a property. The distinction is that
concepts can't have a value, but can have properties; and a property
always has a value. Now, the property;concept _pair_ also names a
concept, which can therefore have properties in turn (this has the
same potential as Tom's proposals for generating long UCDs in
principle, but probably very unlikely in practice). There will
doubtless be some rather formal language which makes this cast-iron,
but it's actually fairly intuitive once you get the property/concept
dichotomy and read `;' as `of a' or something like that.
Section 3.1 in the 1.9.9 proposals -- the crucial section of the document,
for which everything else is to some extent just scaffolding, and without
which the rest of the document makes rather less sense -- is what attempts
to describe this. Perhaps that explanation needs work. At any rate, I do
not believe that one has to sign up to the (basically ontology-inspired)
language in that section in order to use the UCDs thus justified.
Indeed, it might be useful for that section to be split into two, one to
communicate the underlying idea to folk who simply want to _use_ UCDs,
and another to reexpress it more formally for the ontology enthusiasts.
In his section 4.5, Tom also remarks that ``Indeed I'm not sure that any
string of words can be determined to be illegal in the old scheme''.
I'd probably agree in outline: there are significantly fewer rules
necessary in the 1.9.9 proposals than in Tom's proposals. The only place
a base concept can go is in the right-most position, and thus you can't
have a concept sitting on its own, since the left-most position is the
name of the property, the value of which is the number/column/whatever
which has been annotated by this UCD (the syntactic mechanism for making
that annotation is outside of scope for the UCD proposals, I'd think).
Also, there are some property-concept pairs that make no sense, such
as stat.err;src. But that's about it -- you don't need any more rules
than that.
Tom constructs an `arith.diff;arith.sum;phot.flux;...' UCD. That does
look unwieldy (but note there's no need for parentheses in the 1.9.9
proposals), but I get the impression that the `arith' UCD tree was to
some extent a kite being flown, and I for one would be surprised if it
made it much beyond this version, partly because it would seem to encourage
such odd-looking UCDs. Also, there's no tying of one table to another
in the 1.9.9 proposals -- I'd think that was out of scope for UCD (and
quite properly so: I'll mention this below).
The 1.9.9 proposals allow no ambiguity in the way that UCDs are
written: properties queue up in front of the single base concept, and
ordering matters, so that stat.max;stat.err;phot.flux is different
from stat.err;stat.max;phot.flux.
More specific points in Tom's proposal, in document order rather than
any other (section references are to Tom's document):
Section 4.1: Bringing the number of terms up to three -- concept,
attribute and modifier -- reminds me of the qualifier/modifier idea
that was in previous versions of the draft, which I still think is an
unstable distinction, and which Roy and Sebastien thankfully managed
to get rid of by simplifying the syntax down to just concept plus
properties (but see below). Also, there's no syntactic distinction
between modifiers and attributes, so in order to apply the extra
ordering rules for those, or even to break the UCD into its three
parts, you have to know which words are of which type. That is, you
can't do it at parse time.
Section 4.1.2 (not an important point, I don't think): I'm puzzled at
the requirement that words in the non-standard namespace must be
distinct from all words in the IVOA namespace. The point of having a
namespace is to make this possible, or (since such duplication would
surely be condemned as bad practice) at least not an error. The rule
also means that if a new word were added to the IVOA namespace which
happened to match a word in a private namespace, the namespaced UCDs
would thereby suddenly become invalid, with no change in the spec.
Section 4.2.2: The `intent' modifier has no corresponding notion in
the 1.9.9 proposals, but it's not clear to me where in those proposals
this would fit in, and I think this is a _problem_ for the 1.9.9
proposals. I can see how it would fit in to what I take the
underlying 1.9.9 model to be, but not into the serialisation of that
model that the 1.9.9 syntax represents. I can see three approaches to
this problem within the general framework of the 1.9.9 proposals. (i)
Rule it out of scope: it's not UCD's problem to talk about what values
are intended to be, since they're only for data discovery, and are not
required to be capable of driving analysis, so that if this `intent'
distinction matters to you, you're going to have to understand the utype
somehow. (ii) Add modifiers like this to the 1.9.9 model and syntax:
that's potentially quite a lot of work, since it would require
thinking very clearly about just what the distinction is between
modifiers and properties, _and_ working out a usable syntax for adding
them in -- they _have_ to be distinguishable at parse time. (iii)
Think about it more and discover a way they can be viewed as
properties in a principled way. The point isn't just about this
`intent' modifier: if we can convince ourselves that there are things
like `intent' (and that they're in scope) which are in principle
qualitatively distinct from properties (and I would at least dispute
that `em' and `frame' count here), then that has to be dealt with.
Perhaps this example will help us find the stable distinction between
`qualifiers' and `modifiers' that escaped us in earlier versions.
Section 4.2.3: The `value', `vector', `instance' and `multiplet'
attributes seem overly complicated. The `value' attribute is not
required in the 1.9.9 proposals because all properties have a value,
namely the value they're being used to annotate. The other three seem
artefacts of the `complex UCDs' which Tom is introducing in these
proposals. These complex UCDs seem problematic to me because they
seem tightly bound to VOTable. That destroys the orthogonality of the
UCD and VOTable specs (the W3C has had _terrible_ trouble with
non-orthogonal specs, tying itself in knots trying to resolve their
dependencies on each other), and makes it harder to use UCDs in other
contexts, such as queries. I feel that UCDs should be seen as
annotating a `thing', whether that `thing' be a value, a column, a
group, or a query `phrase', and it should be the responsibility of
whatever defines the syntax of that annotation (that is, VOTable or
SIA) to define precisely what the thing is that the annotation applies
to. Thus, VOTable might say that when a UCD appears in a <field> then
it indicates a set of relationships between the corresponding entries
of the table; when it appears in a <group> it means something
different; and so on. Dealing with the typing and complexity issues
of this in a general way within the UCD spec would surely make it
impossibly unwieldy and limit its scope. This is also a general worry
for all of Tom's Section 5; I really think this should be out of scope
for UCD, to the extent that Tom's ``The grouping does not describe the
semantics of the relationship. That is the role of UCDs'' would be
much better as ``The grouping describes (some of?) the semantics of
the relationship. That is not the role of UCDs''. This is a can of
worms.
Section 4.2.3 (local): I agree this is a gap in the 1.9.9 proposals.
Another way of dealing with it would be to say that a UCD <word>
`local.X' meant exactly the same as the <word> `X', but was not
comparable with it.
More general points:
Tom's document seems to discuss his proposals in object terms.
However the property-concept parts of the UCD proposal are _not_ an
object model, and if you cram them into an object model, they won't
fit, and the result will inevitably look like a mess, and look
backwards. The model is simpler than this, however: things which are
purely concepts (such as `src') don't have values. Concepts do have
properties though, and these properties have numeric values, namely
the numeric values we're trying to annotate with this UCD.
As regards ordering, yes, as Tom said, it doesn't fundamentally
matter, and it's just a matter of syntax, rather than of the model.
However having the property first seems natural, since it's this
which posesses the numerical value which is being annotated, and
so it's this which I would have thought it best would be shown
up-front.
Now, there is a _vague_ object model implicit in the construction of
the UCD words like `pos.eq.ra', but this is only because, along with
the replacement of underscores with dots, came the explicit freedom to
crop each word at a dot from the right, and use the result as a UCD
word also. This prompts a natural perception of the words as
hierarchical, or object-oriented if you must. The actual words are
basically little changed from the original UCDs, though there's a
review of these under way. These words weren't the main point of the
UCD2 proposals.
At present these words are those mined from the column names actually
occurring in the databases in the CDS collection; they are thus
unprincipled. Whether this is a good or a bad thing is an open question.
I'm sure it is this which causes some people (I'm thinking of Gerard
Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra
for special deprecation as incoherent. If you believe that principled
generation of UCD words would be a Good Thing (and that would probably
be my prejudice), then I suspect that paths in (say) Gerard and Pat's
model would be a good way to do it (do Gerard and Pat claim that every
UCD word is thus expressible?). If you believe, on the other hand, that
the mined nature of the words is of primary importance (and I can see
the force of that, too), then they might need little more than a review
or tidy-up, to make sure that the `croppability' is reasonable in fact,
and that the implications, or suggestions, of the words chosen do in
fact fit in with a properties-based model (or whatever we end up with).
Phew! I think that's probably quite enough for just now -- I should
let someone else get a word in.
All the best,
Norman
--
---------------------------------------------------------------------------
Norman Gray http://www.astro.gla.ac.uk/users/norman/
Physics and Astronomy, University of Glasgow, UK norman at astro.gla.ac.uk
More information about the dm
mailing list