Thes^H^H^H^HVocabularies (was: A murder of crows)
Norman Gray
norman at astro.gla.ac.uk
Wed Nov 21 13:18:02 PST 2007
Brian and Rob, hello.
On 2007 Nov 21, at 18:35, Rob Seaman wrote:
>> Those arguments are to do with audience (expert vs. non-expert)
>> and previous investments (three important journals already have
>> actual resources tagged with actual vocabulary items).
>
> So there are a number of curated vocabularies, one of which (the
> "IAU Thesaurus") is called a thesaurus, but is actually a
> vocabulary like all the rest?
Until today, I'd had the nagging feeling that a thesaurus was
something more exotic than it is. But no, it's just a vocabulary
plus light structure. So...
> As far as audience, we should refer to our work products by names
> designed to reach the non-experts.
Absolutely. And I think that 'vocabulary' would do perfectly well
for everyone.
I think we should carry on SKOSifying the structure already contained
within things like A&A, AOIM and IAU, because it will be valuable and
is free, but I suggest that outside this group we talk exclusively
about 'vocabularies', on the grounds that no-one but us knows the
distinction, and even we're not that bothered. Yes?
Brian:
> Human-to-machine interaction is not particularly relevant in my mind
> at this time, as it is, and is likely to be, an interface which is
> crafted by
> the individual archive/repository/tool builder. IF we were going ahead
> to specify a natural language query (NLQ) in which the terms of the
> thesaurus
> were to be used, then I can see a need for it. But a NLQ (particularly
> one which may be executed across the entire IVOA!!) is just far,
> far away
> and not as pressing as the issues of dataset labeling, machine to
> machine interchange and development of machine understanding of
> data (e.g. ontologies).
That's interesting -- I see it as very much the other way around!
I'm not thinking of full-scale NLQ, but simply getting the machine to
do something a bit brighter when a user types 'cataclysmic binary'
into a VOExplorer search box. "Ah: 'cataclysmic binary' is part of
an altLabel of the iau#cataclysmicvariablestars concept, so I'll make
that concept-query of Registry++; mmm, not many hits, so I'll
speculatively query iau#binarystars and iau#variablestars as well and
offer those to the user. In any case, by this time we're in logic-
land, so I'll find what CDS-AstroOnt ontology classes have
iau#cataclysmicvariablestars as a relatedConcept (say), because I
know that the CDS-AstroOnt classes have links to SIMBAD terms, so I
can hit the SIMBAD database, too. Plus, via inter-vocabulary links,
I now know what A&A concepts these relate to, and from their
prefLabels know which strings to look up in ADS." And so on.
Now, Brian and Ed could tell (and have frequently told) a very
similar story using only ontologies; indeed _I've_ told a similar
story using ontologies. But ontologies can do more than this, all
the way up to machine understanding of data, and we will need this,
just as you say.
What I see the vocabulary stuff as doing is a couple of relatively
simple things:
* gathering the low-hanging fruit represented by the minimally
structured keyword lists already in existence; and
* helping users get from strings to a controlled vocabulary, and
thence into ontology-land
* ...which should help with searching.
So that's why I see the vocabularies stuff as being easier, bringing
short-term gains, and providing a route into the fuller ontologies
work to come.
> If "Thesaurus" automatically implies machine-to-human interaction,
> then
> I apologize, and then move that we change to a compatible term which
> implies "machine-to-machine" instead (vocabulary? dictionary?)
I think 'ontologies' is good....
All the best,
Norman
--
------------------------------------------------------------
Norman Gray : http://nxg.me.uk
eurovotech.org : University of Leicester, UK
More information about the semantics
mailing list