Vocabulary: Ontology

Fri Sep 14 03:07:16 PDT 2007

On 14 Sep 2007, at 12:24 am, Norman Gray wrote:

>> This is _exactly_ the other half of my point: we don't want to  
>> insist that every VO user/programmer be forced to "create your own  
>> professional vocabulary, publish it in RDF, and either let others  
>> declare equivalence, or be proactive and declare equivalence from  
>> your side if you think it's useful".
>
> Indeed not, but there are existing vocabularies, with different  
> ones familiar to different user groups (of course, that's what this  
> whole discussion is about).  We've effectively discovered yet  
> another one -- the emergent wikipedia vocabulary -- in the course  
> of this thread.  Having multiple vocabularies is simply a fact of  
> life (welcome to the web, welcome to science), and adding a new one  
> won't magically make this 'problem' go away.

No indeed.

> The suggestion made here in detail by Bernard and alluded to in my  
> first message in this thread is that, rather than create a new  
> vocabulary, we might instead document the relationships between  
> existing deployed vocabularies in a machine-readable way, with the  
> goal that data creators can use the vocabulary they prefer, and  
> data consumers can 'hear' the vocabulary _they_ prefer.
>
> That's not _completely_ trivial, for formal and performance  
> reasons.  Is this computationally expensive? (this is Alasdair's  
> point about having to analyse VOEvent packets in real time)  Yes,  
> if you do the mapping dynamically; no, if you 'compile' the  
> mappings offline and let the real-time application use the result  
> -- obviously a better idea in that case, and a technique used in  
> other domains.

In fact, our original VOcabulary schema was meant to encourage the  
production and use of other vocabularies, but with the benefit of a  
robust Standard Vocabulary, so that the amount of pre-translation/ 
ontological research needed in order to get a working vocabulary  
going would be small.

On the other hand, the average VO-developer doesn't really want to  
get "BLUE STRAGGLER STARS" from the IAU Thesaurus, "Star : Type:  
Eclipsing" from the AOIM, "em.optical" from UCD, and "Radial  
Velocities" from the ADC before being able to do some simple  
describing.   That all of these exist is fine and if they all have  
references to each other one can eventually get a tool working which  
tries to find the best match, but life would still be MUCH simpler if  
there was a robust starting point which would be good enough for most  
purposes.  Then you just need the fancy tool for the specialized,  
difficult cases.

> Is it formally doable?  Yes.  You wouldn't have to be super-clever  
> about it to get something very functional, and you could get a very  
> simple but useful version without any reasoning at all (this much  
> is quite like the declarations of equivalences in the VOcabulary  
> document, minus the new vocabulary).
>
> Thus we get more value for less work.  That's a good thing.
>
> Putting in the hooks for a potential future tie-in with the likes  
> of wikipedia seems like an outreach no-brainer to me.
>
>> 	- replace all the XML/Schema in the "Note" aka "Working Draft",  
>> substituting trivially a simple SKOS/RDF equivalent (see my toy  
>> example at http://www.astro.physik.uni-goettingen.de/~hessman/rdf  
>> which I'm sure you can all quickly improve upon) and publish it as  
>> a true proposal with working examples like UCD, AOIM, A&A,...  
>> (easy - can be done in a day;
>
> That sounds like a good plan.  I'll try the same, if I might,  
> without studying yours, and we can see where we match (would you be  
> able to give me a copy of your plain-text source document with the  
> equivalences identified?  Or is that source file essentially  
> <http://www.ivoa.net/internal/IVOA/IvoaSemantics/AOIM_index.html>,  
> say, with the markup stripped out?).

The ASCII files for all the vocabularies we used are available at

	http://www.astro.physik.uni-goettingen.de/~hessman/rdf/ASCII/

which include VOcabulary-prepared headers for the following  
vocabularies:

	AAkeys-v1.0.txt			the A&A journal keywords
	ADC-v1.0.txt				the ADC keywords
	AOIM-v1.0.txt			the AOIM taxonomy
	HOU-v1.0.txt				the HOU image keywords
	SV-v1.0.txt				the proposed Standard Vocabulary  keywords
	UCD1-v1.0.txt			UCD1+
	Vizier-1.0.txt				the Vizier keywords
	VOEvent-v1.0.txt			a toy VOEvent vocabulary

for anyone who would like to give it a try.   The contents are '|'- 
separated fields.  The VOcabulary XML headers are "commented out"  
with leading "#HEADER"'s.

While I appreciate Norman's idea of several approaches, there can't  
be too many options for defining a simple and useful basic scheme.    
Maybe what we REALLY need are some  use cases showing what is  
possible and where solutions might be less obvious or tricky.

>> 	- suggest that the IVOA accept this simple, nearly totally  
>> globally-standardized SKOS/RDF format/subset/extension as the  
>> recommended format for all VO vocabularies like UCD, AOIM, .....  
>> (after all, SKOS/RDF is already defined, there shouldn't be much  
>> to discuss if we can agree we're publishing a list of tokens with  
>> a minimal amount of RDF baggage);
>
> +1
>
> As it happens, I was talking to one of the orginal SKOS developers  
> this very morning, being filled in on some of SKOS's background  
> (the information retrieval and library communities) and roadmap  
> (the current version of SKOS Core is likely to suffer no more than  
> peripheral tweaks before PR).

That's certainly comforting.

> By the way, and just for the record, I'm not prejudging that  
> specifically SKOS would be the answer that best suits us.  I'm not  
> suggesting that we commit to it before we've taken a really close  
> look, though I would lay money that that's how it will turn out.

Ah, but the point is that the format shouldn't make much difference -  
remember, we're still talking about just publishing vocabularies.

>> Note that the RDF solution gives us connectivity to the web-world  
>> and visions of automatic ontologies, but robs us of the simple usage:
>>
>> 			<Guess ucd="em.opt;sv:GRB;xyz:niftyIdea">
>>
>> which VOEvent types tend to like - about as succinct as it gets.
>
> RDF is indeed verbose when it's written down, so the trick in this  
> sort of situation is to hide the RDF -- essentially, users should  
> never see raw RDF, but only things which which have a one-to-one  
> mapping to this nice manageable substrate (there's a reason why RDF  
> is defined as a data _model_, and not a data format or notation --  
> notational approachability was not, I think, a goal).  That implies  
> a syntax problem to solve, but it's been reduced to one which has a  
> clear principled target, and isn't trying to solve two orthogonal  
> problems at once.

In fact, the _users_ shouldn't see the tokens at all, no matter what  
format they are in.

> * The trouble with NOT
>
> Going back a bit, the fundamental touble with the notion of not(X)  
> is the 'Open World Assumption'.  If a galaxy is not listed as being  
> a member of a cluster, is this because it isn't a member of a  
> cluster, it isn't _known_ to be, or it's known but isn't listed as  
> such?  The 'Open World Assumption' is the acknowledgement that not  
> stating something doesn't mean it's not true.  This can often be  
> handled one way or another, but it's a subtle gotcha so common that  
> it's got its own name.

This and similar problems boils down to an implicit problem: how much  
ontological baggage is "good enough" to be able to say things like  
"this is an orphan transient and not a GRB"?  Is it enough for  
VOEvent to use something like

	<rdf:Bag>
		<Concept rdf:resource="&ucd;src"/>
		<Concept rdf:resource="&sv;time.variation.burst"/>
		<Concept rdf:resource="&ucd;em.optical"/>
		<skosmapping:NOT>
			<Concept rdf:resource="&sv;GRB"/>
		</skosmapping:NOT>
	</rdf:Bag>

with the implied meaning of "Bag" (i.e. everything applies in random  
order).

Does this really give us the SKOS world connectivity or do we need a  
different format.

Should the IVOA suggest a format for such statements so that such  
information from different VO and non-VO contexts are more easily  
connected later into some global ontological scheme?

Are the RDF concepts of concept collections ("Bag"), sequences  
("Seq"), and multiple choices "Alt" plus the SKOS mapping constructs  
AND, OR, and NOT good enough for what minimal manipulation /  
communication / concept documentation we need for 99% of the VO?

Rick

P.S. A purely technical point:  are the SKOS labels the actual tokens  
or are the resource names the point?  Is an alias better described  
using a <skos:altLabel> or as a reference?.....

------------------------------------------------------------------------ 
------------------------
Dr. Frederic V. Hessman     Hessman at Astro.physik.Uni-Goettingen.DE
Institut für Astrophysik          Tel.  +49-551-39-5052
Friedrich-Hund-Platz 1         Fax +49-551-39-5043
37077 Goettingen                 Room F04-133
http://www.Astro.physik.Uni-Goettingen.de/~hessman
------------------------------------------------------------------------ 
-------------------------
MONET: a MOnitoring NEtwork of Telescopes
http://monet.Uni-Goettingen.de
------------------------------------------------------------------------ 
-------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/semantics/attachments/20070914/c0e28de1/attachment-0001.html>