Beyond the draft proposal
Norman Gray
norman at astro.gla.ac.uk
Mon Feb 4 13:48:58 PST 2008
Brian and all, hello.
On 2008 Feb 4, at 18:01, Brian Thomas wrote:
> On Monday 04 February 2008 10:55:04 am Frederic Hessman wrote:
>> Starting to think beyond the IVOA draft proposal:
>>
>> Right now, the IVOAT vocabulary (a cleaned-up version of the old IAU
>> thesaurus) doesn't really cover everything one might need, e.g.
>> there
>> are the folowing (expressed as tokens)
>>
>> JohnsonPhotometry
>> RMagnitude
>> Filter
>>
>> so don't we really need
>>
>> RFilter
>>
>> or even
>>
>> JohnsonRFilter?
>
> Yes. there are too many 'compound' terms. These may generally be
> identified
> by the multiple words which comprise the token.
I smell mission creep!
Remember that we're defining _vocabularies_ here. One of the main
distinctions between vocabularies and ontologies is that the former
service a different goal from the latter. That goal is searching, or
something very like it; vocabularies are much closer to humans -- to
UIs -- than ontologies are, and in consequence they are inevitably
messier.
The concepts in an ontology are, as Rick, Brian and Ed rightly say,
much more atomic; they're assembled through careful, logical,
principled thought, but are as a consequence remote from users'
experience and expectations.
Please be reassured that I'm not at all deprecating the ontologies
work that you and the CDS folk are doing. I hope to be involved with,
and anticipate benefitting from, IVOA ontologies in future, but when I
do so I'll be doing something different from what we're doing with
vocabularies.
> e.g. to get
> "JohnsonRFilter" one simply creates "Filter+JohnsonPhotometry
> +RMagnitude" (and I
> suppose order is important here). I'd rather not postulate further
> about the
> difficulties/needs of this hypothetical system...it is just an
> example! Rather, the
> point here is that I think this is something we should not worry
> about.
This is true, but it's 'JohnsonRFilter' that astronomers will actually
think of and search for, and not a formal intersection of three
disjoint sets.
Vocabularies have to comprise the terms that users actually use,
minimally tidied up. The result may be messy and hard to reason with,
but that's OK, because the world is messy, and we don't want to reason
with vocabularies.
> We definitely should try to remove compound terms which may be created
> from 'atomic' stuff already present in the vocabulary.
In an ontology, yes. But ontologies are made; vocabularies are
discovered.
It may be useful to create a vocabulary from an ontology. This is
done in some health-care vocabularies, where there are lots and lots
of terms like UpperArmFracture, LowerArmFracture, UpperLegFracture and
LowerLegFracture. In this case, it makes sense to start with an
appropriate ontology (which is for example much easier to maintain)
and generate the vocabulary from this. It's this "pre-coordinated
ontology" (see Google) that's actually delivered to, and used by, users.
This appears to go against the remark above that vocabularies are
discovered, not made[1]. The difference is that in this case, of
health-care infomatics, the original natural-language vocabulary was
actually extremely well structured, which meant the 'tidying up' step
could be very profound (and the underlying reason for that is that
this was actually "US health-care _billing_ informatics"; if
telescopes had fine-grained billing for observations, I suspect we'd
have something very similar ourselves).
We can return, here, to Rick's assertion that the form of the 'master'
of a vocabulary doesn't matter. If it's possible to generate
something roughly like the IAU93 vocabulary from some more principled
ontology, then excellent, let's do that, but we shouldn't hold up the
process while that master is developed.
And I think the result _should_ look much like the IAU original. My
impression of what was being aimed at in the IVOAT was a tidied up and
updated IAU93. Let's keep it simple and quick.
We can afford to keep it simple. Since a key feature of this current
effort is the mappings that let us handle multiple vocabularies, we
can concentrate on getting something simple but useful out now,
without prejudicing our support for a fuller and more ontology-
principled competitor later. The mapping support means we don't have
to get all the vocabularies right first time.
As a separate thing:
> And I would also like to add that I'd like to see a *dictionary* of
> the vocabulary
> terms. This then would settle the semantic meaning of these tokens,
> which is
> the crutial missing link between a vocabulary to ontology usage. I
> have been
> rebuffed/ignored about adding the definitions to the SKOS vocabulary
Have you? Gracious, no: I think it's important to have scope-notes in
the vocabulary where possible. The only problem here is that the
definitions in the IAU93 (coming back to that) are rather terse, and
in many cases are just the IAU93 term translated to lowercase. In
that case, however, an elaborate scope note may not be vital, because
most of those terms are immediately intelligible to the intended
community (ie, astronomers) to the degree of precision appropriate to
a vocabulary (as opposed to the degree necessary for an ontology).
> which
> identify things which should be 'cleaned up' (probably by splitting
> the offending token
> into 2 or more other tokens each with separate meaning).
I'm all for cleaning things up, but I think we need to be vigilant
against this tidyup turning into a full-blown ontological exercise
which, as the DM group can tell us, could end up taking five years
before anyone notices it's late. How about an effective definition of
'cleaned up enough for the IVOAT initial release' being 'whatever can
get done in the next month'?
All the best,
Norman
[1] I'm sure I'm getting an echo of something theological, here.
--
Norman Gray : http://nxg.me.uk
eurovotech.org : University of Leicester
More information about the semantics
mailing list