SV and Thesaurus - decide

Fri Sep 21 07:14:29 PDT 2007

	I guess it is time again to explain the OWLViper tool, to make a clear 
use case for vocabularies and vocabulary extensions and vocabulary 
translations.  Brian Thomas will be presenting a poster at ADASS on 
this.  Even though the tool is not quite ready for launch, it still 
provides a concrete example to think about the SV.
	We have a tool that reads in OWL ontologies and creates a menu of 
objects to choose from.  You can grab AstroObjects and place them on the 
canvas.  Then you can choose properties  of these objects (like 
"hasMeasurement RotationalVelocity" or "hasPart Halo" or "hasStar 
Cepheid") and add them inside the object's box.  Then you can constrain 
values by min and max or contains string.  It can then query for these 
objects or, if it already has data, it can go on to perform operations 
on the data.  But, lets focus on query.
	The best situation (from the application's point of view) is for all 
datacenters to have exactly the same ontology and able to respond to 
requests for OWL subclasses.  That is, the query, in the format of an 
OWL class with the user's restrictions, is sent and the datacenters 
return Individuals that belong to the restricted class.  A datacenter 
"simply" has to ingest all of their data into an OWL database and 
off-the-shelf reasoners can be used to respond to queries.
	It would be nice if we all used the same standard vocabulary, but that 
may not be the case.  What if each datacenter has its own ontology?  You 
could use the tool, one datacenter at a time, by loading in the ontology 
of each datacenter, forming the query in that vocabulary and sending it 
just to the one site.  But this would be laborious.  So, someone has to 
make "translations" from the tool's vocabulary to each of the 
datacenter's.  This just means using the owl:equivalentClass and 
rdfs:subClassOf.  This can be more complicated than it sounds because it 
may require forming complex classes with owl:unionOf and 
owl:intersectionOf. The tool is told which namespace to use for which 
datacenter.
	Next.  What to do about datacenters that are not quite so advanced and 
don't use OWL?  They may use an XML Schema to describe their data.  Then 
hopefully they can respond to an XQuery.  So one can fairly easily 
convert OWL into XQuery, the only hard part is again having a mapping 
between the terms.  One way to proceed is to have software that can 
automatically transform between OWL and XQuery (say an  XSLT) which 
works as long as the vocabularies are consistent.  Then convert the 
Schema to OWL and provide the "translations" as above.  The tool can 
then form OWL queries in the vocabulary of the datacenter and then the 
"standard" OWL-to-XQuery tool finishes the job.
	For datacenters that use ADQL, it is quite similar to XQUERY case.  The 
relational database schema is similar to (indeed, can be autotranslated 
into) an XML schema.  And OWL-to-ADQL transformation is easy.  By the 
way, it is always easy to down translate, as long as you accept that 
some complex queries will have no equivalent in the simpler language.	
	Some conclusions.  It probably is not a good idea to invent another 
language for mapping between vocabularies since two exist: SKOS and OWL. 
  It probably is not a good idea to use SKOS since a) it is not a 
recommendation b) work is going on to make it compatible with OWL c) OWL 
  is more powerful and is either the language that we will want to have 
in  the long run or is an ancestor of the language that we will want and 
d) OWL is a WWW Recommendation with real, existing, working, fairly 
stable tools (editors and visualizers and even wizards) and has support 
from a number of important scientific fields with $$$$.
	One can format an SV of Classes (tokens?) with ascii text using indents 
to imply subclassing.  Protege's Subclass Wizard will read it in and 
create OWL instantly.  One can format an SV with SKOS and a couple of 
substitution commands in vi will transform it to OWL.  So, I don't care 
which of these are used at first.  However, with OWL it is easier to 
visualize and modify and check for consistency, and one can get on with 
adding properties (mostly connecting the Classes with the Measurements 
that are pertinent to them and the allowed ranges and datatypes).
	One last note.  I don't believe it is useful to have a vocabulary with 
gamma, ray, burst and then say that you have everything you need to form 
the concept gamma_ray_burst.  Adding some colons and semicolons into it 
will not help.  We need the full term gamma_ray_burst and explicit 
machine readable statements on it: subClassOf Explosion, hasTimeSeries, 
TimeSeries hasDuration D and hasValue > J janskies.  On the other hand, 
I am not advocating terms like gamma_ray_burstHasTimeSeriesHasDuration.
I am advocating sticking close to natural language.  We speak it because 
it has been proven to almost work.

Ed


Andrea Preite Martinez wrote:
> 
> 
>> sub-set officially. You can't build, from scratch, a list of words that
>> describes everything everyone is doing at the moment, let alone in the
>> future. It is fundamentally not possible to build a canonical and final
>> list of "stuff" in a subject like ours, which deals with broad topics
>> and changes on a fairly rapid time scale. Subjects that have done this
>> face more bounded problem sets.
> 
> This is exactly the sense of my note at the bottom of the msg starting 
> this thread.
> But my note was on Movie B = *the* ultimate vocabulary, or thesaurus, or 
> dictionary, in astronomy, which is not
> (a) what was the request of the other WGs
> (b) by no means tackled in the draft.
> In the draft SV what you can find is "rotation" as indicating the 
> spinning of something around an axis. You don't find, say,  "rotating 
> galaxies" or "rotating asteroid". You can use that token "rotation" in a 
> variety of different situations, from rotating galaxies to rotating 
> asteroids, without the anguish to foresee all possible concepts in all 
> possible contexts, as you have to do in a thesaurus.
> 
>> If you're going to go and build a framework where we can fit our own
>> words and collaboratively create and manage a vocabulary to annotate
>> and categorize our content then I vote YES. If what is being voted on
>> here is going off and building a big list of words, I vote NO.
>>
>> Al.
> 
> I called for a decision on the draft SV: Doug and Rick have reminded us 
> that at least 2 WGs need it. Practical use cases (like the basic ones I 
> put in a recent msg) are not treated in the draft, so I assume that 
> there is a request to edit the draft to include some. Perfect, this is 
> why we are discussing the draft in the WG.
> We have all the elements to take a decision.
> 
> I didn't call yet for a decision on the astronomical 
> vocabulary/thesaurus/ontology.
> There are  basic questions asked by Tony on this second topic (for 
> instance: who needs it and to do what) that are yet not answered at all. 
> We certainly need to discuss more on this.
> 
> Cheers
> Andrea
> 
> =================================================================================== 
> 
> Andrea Preite Martinez                 
> andrea.preitemartinez at iasf-roma.inaf.it
> IASF                                   Tel.IASF:+39.06.4993.4641
> Via del Fosso del Cavaliere 100        Tel.CDS :+33.3.90242452
> I-00133 Roma                           Cell.   :+39.320.43.15.383
>                                        Skype   :andrea.preite.martinez
> =================================================================================== 
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eshaya.vcf
Type: text/x-vcard
Size: 257 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/semantics/attachments/20070921/6883bc45/attachment-0001.vcf>