utype questions (issue 3 - vocabularies/ontologies)
Norman Gray
norman at astro.gla.ac.uk
Thu Jul 2 08:06:15 PDT 2009
Greetings, all.
Here are some remarks specifically about the third of the three issues
that are being discussed, namely the questions of what vocabularies or
ontologies we might actually end up using.
On 2009 Jun 30, at 02:41, Roy Williams wrote:
> Frederic Hessman wrote:
>> Sorry to dig back into the utype question, but why isn't the use of
>> multiple, translatable vocabularies a la SKOS the ideal (indeed
>> only) solution? Don't want user readability, don't want to enforce
>> a single usage, don't need an ontology, don't want to restrict
>> mixing and matching as long as I can match what's been mixed, just
>> need a good label. Or am I being naive and/or single-minded?
>>
>> Rick
>>
> Rick et al
>
> Interesting thread you have started! So here is my 2 cents:
>
> -- Utypes is a system for precise, formal descriptions of data
> structures so computers can find them.
>
> -- Vocabulary is a classification system that is often probabilistic
> in usage (or should be).
>
> So it is oil and water, no? Then the question is less about
> eliminating the boundary between these ways of thinking, but rather
> what happens at the boundary. Perhaps we can ask the question of how
> to link in each direction.
As Rick said, oil and vinegar is a better analogy. Shake vigourously
and you get something nicer than both.
For example, you might find 'redshift' in a data model, meaning 'this
number is the value of the redshift that we measured'. Here, it's
clearly related to a UCD for redshift, which descends from a database
column heading. This is different from the _concept_ of redshift,
meaning 'the thing that's proportional to recession velocity, and
which the Supernova Cosmology Project has a lot of measurements of'.
The former might appear in a data-model ontology, the latter in a
vocabulary; the former tells you exactly what to do with the number
which is the value of the redshift, the latter helps you find a
database which is full of these measurements.
Crucially, although different people will typically develop the two
artefacts, ontology and vocabulary, the two things will refer to each
other, so that if you have a database which includes some measurements
of datamodel-redshift, you can tell that this database may be about
concept-redshift.
> The Utypes people might like to say that their quantity derives from
> theoretical models of radio-quite AGN. They want to link to:
> http://eurovotech.org/objects-structure#RadioQuietAGN
I may have given the impression that ontologies ~ data access ~
measurements, and vocabularies ~ search ~ natural objects, but the CDS
Ontology of Astronomical Object Types (OAOT) shows that it can be
useful to have an ontology (in this case a very intricate and well-
founded ontology) of natural objects, too.
Crib:
Vocabularies: loosely structured, typically referring to physical
objects (star rather than redshift), most naturally useful for search,
cheap.
Ontologies: more tightly structured (specifically statements like 'all
LINERs are AGNs' or 'all Johnson R filters are red filters; all
Cousins R filters are red filters'), typically referring (when we're
talking about datamodels) to measurements, naturally useful for
'understanding' data, costlier.
I think this addresses Rick's point at the top, and I hope suggests
that vocabularies and ontologies are two ends of a single continuum.
> How can the two sides cite each other so that we can refer to
> technical and semantic concepts from each side to the other? How I
> can use these special words for my own purposes: if I can choose to
> go with Utype *or* Vocabulary, which provides more services? How can
> I get a definition?
I'm trying to keep this message short, but: yes we can! I can go on
about this at some length, but linking between these formal terms is
where all the benefit comes. That's a little further down the line,
but can't happen unless we can get the IVOA's primary datamodel names
to be compatible with the last decade's work in this area.
Also, Brian said (http://www.ivoa.net/forum/semantics/0906/0931.htm):
>> -- Utypes is a system for precise, formal descriptions of data
>> structures so computers can find them.
>
> Perhaps what is really wrong here is that this attempt is trying to
> find some mechanism to
> re-use the VOTable to hold complex hierarchical models. There are
> common, formal, precise
> mechanisms already for computers to read/parse and share and
> understand data models.
Unless I'm adrift, here, I don't think we're talking just about
VOTables, though those are the places where we most commonly see
UTypes. The goal here is to address the situation where -- however
it's wrapped up in a VOTable, in random XML, or in 80-character FITS
records -- you have a set of key-value pairs which together describe a
measurement of some type.
All the best,
Norman
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester, UK
More information about the semantics
mailing list