Beyond the draft proposal

Norman Gray norman at astro.gla.ac.uk
Mon Feb 4 13:48:58 PST 2008


Brian and all, hello.

On 2008 Feb 4, at 18:01, Brian Thomas wrote:

> On Monday 04 February 2008 10:55:04 am Frederic Hessman wrote:
>> Starting to think beyond the IVOA draft proposal:
>>
>> Right now, the IVOAT vocabulary (a cleaned-up version of the old IAU
>> thesaurus) doesn't really cover everything one might need, e.g.   
>> there
>> are the folowing (expressed as tokens)
>>
>> 	JohnsonPhotometry
>> 	RMagnitude
>> 	Filter
>>
>> so don't we really need
>>
>> 	RFilter
>>
>> or even
>>
>> 	JohnsonRFilter?
>
> Yes. there are too many 'compound' terms. These may generally be  
> identified
> by the multiple words which comprise the token.

I smell mission creep!

Remember that we're defining _vocabularies_ here.  One of the main  
distinctions between vocabularies and ontologies is that the former  
service a different goal from the latter.  That goal is searching, or  
something very like it; vocabularies are much closer to humans -- to  
UIs -- than ontologies are, and in consequence they are inevitably  
messier.

The concepts in an ontology are, as Rick, Brian and Ed rightly say,  
much more atomic; they're assembled through careful, logical,  
principled thought, but are as a consequence remote from users'  
experience and expectations.

Please be reassured that I'm not at all deprecating the ontologies  
work that you and the CDS folk are doing.  I hope to be involved with,  
and anticipate benefitting from, IVOA ontologies in future, but when I  
do so I'll be doing something different from what we're doing with  
vocabularies.

> e.g. to get
> "JohnsonRFilter" one simply creates "Filter+JohnsonPhotometry 
> +RMagnitude" (and I
> suppose order is important here). I'd rather not postulate further  
> about the
> difficulties/needs of this hypothetical system...it is just an  
> example! Rather, the
> point here is that I think this is something we should not worry  
> about.

This is true, but it's 'JohnsonRFilter' that astronomers will actually  
think of and search for, and not a formal intersection of three  
disjoint sets.

Vocabularies have to comprise the terms that users actually use,  
minimally tidied up.  The result may be messy and hard to reason with,  
but that's OK, because the world is messy, and we don't want to reason  
with vocabularies.

> We definitely should try to remove compound terms which may be created
> from 'atomic' stuff already present in the vocabulary.

In an ontology, yes.  But ontologies are made; vocabularies are  
discovered.

It may be useful to create a vocabulary from an ontology.  This is  
done in some health-care vocabularies, where there are lots and lots  
of terms like UpperArmFracture, LowerArmFracture, UpperLegFracture and  
LowerLegFracture.  In this case, it makes sense to start with an  
appropriate ontology (which is for example much easier to maintain)  
and generate the vocabulary from this.  It's this "pre-coordinated  
ontology" (see Google) that's actually delivered to, and used by, users.

This appears to go against the remark above that vocabularies are  
discovered, not made[1].  The difference is that in this case, of  
health-care infomatics, the original natural-language vocabulary was  
actually extremely well structured, which meant the 'tidying up' step  
could be very profound (and the underlying reason for that is that  
this was actually "US health-care _billing_ informatics"; if  
telescopes had fine-grained billing for observations, I suspect we'd  
have something very similar ourselves).

We can return, here, to Rick's assertion that the form of the 'master'  
of a vocabulary doesn't matter.  If it's possible to generate  
something roughly like the IAU93 vocabulary from some more principled  
ontology, then excellent, let's do that, but we shouldn't hold up the  
process while that master is developed.

And I think the result _should_ look much like the IAU original.  My  
impression of what was being aimed at in the IVOAT was a tidied up and  
updated IAU93.  Let's keep it simple and quick.

We can afford to keep it simple.  Since a key feature of this current  
effort is the mappings that let us handle multiple vocabularies, we  
can concentrate on getting something simple but useful out now,  
without prejudicing our support for a fuller and more ontology- 
principled competitor later.  The mapping support means we don't have  
to get all the vocabularies right first time.

As a separate thing:

> And I would also like to add that I'd like to see a *dictionary* of  
> the vocabulary
> terms. This then would settle the semantic meaning of these tokens,  
> which is
> the crutial missing link between a vocabulary to ontology usage. I  
> have been
> rebuffed/ignored about adding the definitions to the SKOS vocabulary

Have you?  Gracious, no: I think it's important to have scope-notes in  
the vocabulary where possible.  The only problem here is that the  
definitions in the IAU93 (coming back to that) are rather terse, and  
in many cases are just the IAU93 term translated to lowercase.  In  
that case, however, an elaborate scope note may not be vital, because  
most of those terms are immediately intelligible to the intended  
community (ie, astronomers) to the degree of precision appropriate to  
a vocabulary (as opposed to the degree necessary for an ontology).

> which
> identify things which should be 'cleaned up' (probably by splitting  
> the offending token
> into 2 or more other tokens each with separate meaning).

I'm all for cleaning things up, but I think we need to be vigilant  
against this tidyup turning into a full-blown ontological exercise  
which, as the DM group can tell us, could end up taking five years  
before anyone notices it's late.  How about an effective definition of  
'cleaned up enough for the IVOAT initial release' being 'whatever can  
get done in the next month'?

All the best,

Norman


[1] I'm sure I'm getting an echo of something theological, here.


-- 
Norman Gray  :  http://nxg.me.uk
eurovotech.org  :  University of Leicester



More information about the semantics mailing list