SV and Thesaurus - decide
Ed Shaya
eshaya at umd.edu
Fri Sep 21 07:14:29 PDT 2007
I guess it is time again to explain the OWLViper tool, to make a clear
use case for vocabularies and vocabulary extensions and vocabulary
translations. Brian Thomas will be presenting a poster at ADASS on
this. Even though the tool is not quite ready for launch, it still
provides a concrete example to think about the SV.
We have a tool that reads in OWL ontologies and creates a menu of
objects to choose from. You can grab AstroObjects and place them on the
canvas. Then you can choose properties of these objects (like
"hasMeasurement RotationalVelocity" or "hasPart Halo" or "hasStar
Cepheid") and add them inside the object's box. Then you can constrain
values by min and max or contains string. It can then query for these
objects or, if it already has data, it can go on to perform operations
on the data. But, lets focus on query.
The best situation (from the application's point of view) is for all
datacenters to have exactly the same ontology and able to respond to
requests for OWL subclasses. That is, the query, in the format of an
OWL class with the user's restrictions, is sent and the datacenters
return Individuals that belong to the restricted class. A datacenter
"simply" has to ingest all of their data into an OWL database and
off-the-shelf reasoners can be used to respond to queries.
It would be nice if we all used the same standard vocabulary, but that
may not be the case. What if each datacenter has its own ontology? You
could use the tool, one datacenter at a time, by loading in the ontology
of each datacenter, forming the query in that vocabulary and sending it
just to the one site. But this would be laborious. So, someone has to
make "translations" from the tool's vocabulary to each of the
datacenter's. This just means using the owl:equivalentClass and
rdfs:subClassOf. This can be more complicated than it sounds because it
may require forming complex classes with owl:unionOf and
owl:intersectionOf. The tool is told which namespace to use for which
datacenter.
Next. What to do about datacenters that are not quite so advanced and
don't use OWL? They may use an XML Schema to describe their data. Then
hopefully they can respond to an XQuery. So one can fairly easily
convert OWL into XQuery, the only hard part is again having a mapping
between the terms. One way to proceed is to have software that can
automatically transform between OWL and XQuery (say an XSLT) which
works as long as the vocabularies are consistent. Then convert the
Schema to OWL and provide the "translations" as above. The tool can
then form OWL queries in the vocabulary of the datacenter and then the
"standard" OWL-to-XQuery tool finishes the job.
For datacenters that use ADQL, it is quite similar to XQUERY case. The
relational database schema is similar to (indeed, can be autotranslated
into) an XML schema. And OWL-to-ADQL transformation is easy. By the
way, it is always easy to down translate, as long as you accept that
some complex queries will have no equivalent in the simpler language.
Some conclusions. It probably is not a good idea to invent another
language for mapping between vocabularies since two exist: SKOS and OWL.
It probably is not a good idea to use SKOS since a) it is not a
recommendation b) work is going on to make it compatible with OWL c) OWL
is more powerful and is either the language that we will want to have
in the long run or is an ancestor of the language that we will want and
d) OWL is a WWW Recommendation with real, existing, working, fairly
stable tools (editors and visualizers and even wizards) and has support
from a number of important scientific fields with $$$$.
One can format an SV of Classes (tokens?) with ascii text using indents
to imply subclassing. Protege's Subclass Wizard will read it in and
create OWL instantly. One can format an SV with SKOS and a couple of
substitution commands in vi will transform it to OWL. So, I don't care
which of these are used at first. However, with OWL it is easier to
visualize and modify and check for consistency, and one can get on with
adding properties (mostly connecting the Classes with the Measurements
that are pertinent to them and the allowed ranges and datatypes).
One last note. I don't believe it is useful to have a vocabulary with
gamma, ray, burst and then say that you have everything you need to form
the concept gamma_ray_burst. Adding some colons and semicolons into it
will not help. We need the full term gamma_ray_burst and explicit
machine readable statements on it: subClassOf Explosion, hasTimeSeries,
TimeSeries hasDuration D and hasValue > J janskies. On the other hand,
I am not advocating terms like gamma_ray_burstHasTimeSeriesHasDuration.
I am advocating sticking close to natural language. We speak it because
it has been proven to almost work.
Ed
Andrea Preite Martinez wrote:
>
>
>> sub-set officially. You can't build, from scratch, a list of words that
>> describes everything everyone is doing at the moment, let alone in the
>> future. It is fundamentally not possible to build a canonical and final
>> list of "stuff" in a subject like ours, which deals with broad topics
>> and changes on a fairly rapid time scale. Subjects that have done this
>> face more bounded problem sets.
>
> This is exactly the sense of my note at the bottom of the msg starting
> this thread.
> But my note was on Movie B = *the* ultimate vocabulary, or thesaurus, or
> dictionary, in astronomy, which is not
> (a) what was the request of the other WGs
> (b) by no means tackled in the draft.
> In the draft SV what you can find is "rotation" as indicating the
> spinning of something around an axis. You don't find, say, "rotating
> galaxies" or "rotating asteroid". You can use that token "rotation" in a
> variety of different situations, from rotating galaxies to rotating
> asteroids, without the anguish to foresee all possible concepts in all
> possible contexts, as you have to do in a thesaurus.
>
>> If you're going to go and build a framework where we can fit our own
>> words and collaboratively create and manage a vocabulary to annotate
>> and categorize our content then I vote YES. If what is being voted on
>> here is going off and building a big list of words, I vote NO.
>>
>> Al.
>
> I called for a decision on the draft SV: Doug and Rick have reminded us
> that at least 2 WGs need it. Practical use cases (like the basic ones I
> put in a recent msg) are not treated in the draft, so I assume that
> there is a request to edit the draft to include some. Perfect, this is
> why we are discussing the draft in the WG.
> We have all the elements to take a decision.
>
> I didn't call yet for a decision on the astronomical
> vocabulary/thesaurus/ontology.
> There are basic questions asked by Tony on this second topic (for
> instance: who needs it and to do what) that are yet not answered at all.
> We certainly need to discuss more on this.
>
> Cheers
> Andrea
>
> ===================================================================================
>
> Andrea Preite Martinez
> andrea.preitemartinez at iasf-roma.inaf.it
> IASF Tel.IASF:+39.06.4993.4641
> Via del Fosso del Cavaliere 100 Tel.CDS :+33.3.90242452
> I-00133 Roma Cell. :+39.320.43.15.383
> Skype :andrea.preite.martinez
> ===================================================================================
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eshaya.vcf
Type: text/x-vcard
Size: 257 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/semantics/attachments/20070921/6883bc45/attachment-0001.vcf>
More information about the semantics
mailing list