utype questions (issue 3 - vocabularies/ontologies)

Thu Jul 2 08:06:15 PDT 2009

Greetings, all.

Here are some remarks specifically about the third of the three issues  
that are being discussed, namely the questions of what vocabularies or  
ontologies we might actually end up using.

On 2009 Jun 30, at 02:41, Roy Williams wrote:

> Frederic Hessman wrote:
>> Sorry to dig back into the utype question, but why isn't the use of  
>> multiple, translatable vocabularies a la SKOS the ideal (indeed  
>> only) solution?  Don't want user readability, don't want to enforce  
>> a single usage, don't need an ontology, don't want to restrict  
>> mixing and matching as long as I can match what's been mixed, just  
>> need a good label.   Or am I being naive and/or single-minded?
>>
>> Rick
>>
> Rick et al
>
> Interesting thread you have started! So here is my 2 cents:
>
> -- Utypes is a system for precise, formal descriptions of data  
> structures so computers can find them.
>
> -- Vocabulary is a classification system that is often probabilistic  
> in usage (or should be).
>
> So it is oil and water, no? Then the question is less about  
> eliminating the boundary between these ways of thinking, but rather  
> what happens at the boundary. Perhaps we can ask the question of how  
> to link in each direction.

As Rick said, oil and vinegar is a better analogy.  Shake vigourously  
and you get something nicer than both.

For example, you might find 'redshift' in a data model, meaning 'this  
number is the value of the redshift that we measured'.  Here, it's  
clearly related to a UCD for redshift, which descends from a database  
column heading.  This is different from the _concept_ of redshift,  
meaning 'the thing that's proportional to recession velocity, and  
which the Supernova Cosmology Project has a lot of measurements of'.   
The former might appear in a data-model ontology, the latter in a  
vocabulary; the former tells you exactly what to do with the number  
which is the value of the redshift, the latter helps you find a  
database which is full of these measurements.

Crucially, although different people will typically develop the two  
artefacts, ontology and vocabulary, the two things will refer to each  
other, so that if you have a database which includes some measurements  
of datamodel-redshift, you can tell that this database may be about  
concept-redshift.

> The Utypes people might like to say that their quantity derives from  
> theoretical models of radio-quite AGN. They want to link to:
> http://eurovotech.org/objects-structure#RadioQuietAGN

I may have given the impression that ontologies ~ data access ~  
measurements, and vocabularies ~ search ~ natural objects, but the CDS  
Ontology of Astronomical Object Types (OAOT) shows that it can be  
useful to have an ontology (in this case a very intricate and well- 
founded ontology) of natural objects, too.

Crib:

Vocabularies: loosely structured, typically referring to physical  
objects (star rather than redshift), most naturally useful for search,  
cheap.

Ontologies: more tightly structured (specifically statements like 'all  
LINERs are AGNs' or 'all Johnson R filters are red filters; all  
Cousins R filters are red filters'), typically referring (when we're  
talking about datamodels) to measurements, naturally useful for  
'understanding' data, costlier.

I think this addresses Rick's point at the top, and I hope suggests  
that vocabularies and ontologies are two ends of a single continuum.

> How can the two sides cite each other so that we can refer to  
> technical and semantic concepts from each side to the other? How I  
> can use these special words for my own purposes: if I can choose to  
> go with Utype *or* Vocabulary, which provides more services? How can  
> I get a definition?

I'm trying to keep this message short, but: yes we can!  I can go on  
about this at some length, but linking between these formal terms is  
where all the benefit comes.  That's a little further down the line,  
but can't happen unless we can get the IVOA's primary datamodel names  
to be compatible with the last decade's work in this area.

Also, Brian said (http://www.ivoa.net/forum/semantics/0906/0931.htm):

>> -- Utypes is a system for precise, formal descriptions of data 
>> structures so computers can find them.
>
> Perhaps what is really wrong here is that this attempt is trying to  
> find some mechanism to
> re-use  the VOTable to hold complex hierarchical models. There are  
> common, formal, precise
> mechanisms already for computers to read/parse and share and  
> understand data models.

Unless I'm adrift, here, I don't think we're talking just about  
VOTables, though those are the places where we most commonly see  
UTypes.  The goal here is to address the situation where -- however  
it's wrapped up in a VOTable, in random XML, or in 80-character FITS  
records -- you have a set of key-value pairs which together describe a  
measurement of some type.

All the best,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester, UK