Vocabulary: Ontology

Norman Gray norman at astro.gla.ac.uk
Thu Sep 13 15:24:31 PDT 2007


Rick and others, hello.

Other people have already made most of the points I've wanted to  
make, and for most of the remainder, the time for them has now  
passed.  Also, I can sense Andrea moving to wrap things up a bit.   
There are a couple of points left over, though.

On 2007 Sep 12, at 10:59, Frederic V. Hessman wrote:
> This is _exactly_ the other half of my point: we don't want to  
> insist that every VO user/programmer be forced to "create your own  
> professional vocabulary, publish it in RDF, and either let others  
> declare equivalence, or be proactive and declare equivalence from  
> your side if you think it's useful".

Indeed not, but there are existing vocabularies, with different ones  
familiar to different user groups (of course, that's what this whole  
discussion is about).  We've effectively discovered yet another one  
-- the emergent wikipedia vocabulary -- in the course of this  
thread.  Having multiple vocabularies is simply a fact of life  
(welcome to the web, welcome to science), and adding a new one won't  
magically make this 'problem' go away.

The suggestion made here in detail by Bernard and alluded to in my  
first message in this thread is that, rather than create a new  
vocabulary, we might instead document the relationships between  
existing deployed vocabularies in a machine-readable way, with the  
goal that data creators can use the vocabulary they prefer, and data  
consumers can 'hear' the vocabulary _they_ prefer.

That's not _completely_ trivial, for formal and performance reasons.   
Is this computationally expensive? (this is Alasdair's point about  
having to analyse VOEvent packets in real time)  Yes, if you do the  
mapping dynamically; no, if you 'compile' the mappings offline and  
let the real-time application use the result -- obviously a better  
idea in that case, and a technique used in other domains.

Is it formally doable?  Yes.  You wouldn't have to be super-clever  
about it to get something very functional, and you could get a very  
simple but useful version without any reasoning at all (this much is  
quite like the declarations of equivalences in the VOcabulary  
document, minus the new vocabulary).

Thus we get more value for less work.  That's a good thing.

Putting in the hooks for a potential future tie-in with the likes of  
wikipedia seems like an outreach no-brainer to me.

> 	- replace all the XML/Schema in the "Note" aka "Working Draft",  
> substituting trivially a simple SKOS/RDF equivalent (see my toy  
> example at http://www.astro.physik.uni-goettingen.de/~hessman/rdf  
> which I'm sure you can all quickly improve upon) and publish it as  
> a true proposal with working examples like UCD, AOIM, A&A,... (easy  
> - can be done in a day;

That sounds like a good plan.  I'll try the same, if I might, without  
studying yours, and we can see where we match (would you be able to  
give me a copy of your plain-text source document with the  
equivalences identified?  Or is that source file essentially <http:// 
www.ivoa.net/internal/IVOA/IvoaSemantics/AOIM_index.html>, say, with  
the markup stripped out?).

> 	- suggest that the IVOA accept this simple, nearly totally  
> globally-standardized SKOS/RDF format/subset/extension as the  
> recommended format for all VO vocabularies like UCD, AOIM, .....  
> (after all, SKOS/RDF is already defined, there shouldn't be much to  
> discuss if we can agree we're publishing a list of tokens with a  
> minimal amount of RDF baggage);

+1

As it happens, I was talking to one of the orginal SKOS developers  
this very morning, being filled in on some of SKOS's background (the  
information retrieval and library communities) and roadmap (the  
current version of SKOS Core is likely to suffer no more than  
peripheral tweaks before PR).

By the way, and just for the record, I'm not prejudging that  
specifically SKOS would be the answer that best suits us.  I'm not  
suggesting that we commit to it before we've taken a really close  
look, though I would lay money that that's how it will turn out.






I'll just add a couple of further points here.

* RDF and verbosity

Back on the 10th, Rick said:

> Note that the RDF solution gives us connectivity to the web-world  
> and visions of automatic ontologies, but robs us of the simple usage:
>
> 			<Guess ucd="em.opt;sv:GRB;xyz:niftyIdea">
>
> which VOEvent types tend to like - about as succinct as it gets.

RDF is indeed verbose when it's written down, so the trick in this  
sort of situation is to hide the RDF -- essentially, users should  
never see raw RDF, but only things which which have a one-to-one  
mapping to this nice manageable substrate (there's a reason why RDF  
is defined as a data _model_, and not a data format or notation --  
notational approachability was not, I think, a goal).  That implies a  
syntax problem to solve, but it's been reduced to one which has a  
clear principled target, and isn't trying to solve two orthogonal  
problems at once.

So I entirely take your point here, and agree it's an extremely  
important one.

* The trouble with NOT

Going back a bit, the fundamental touble with the notion of not(X) is  
the 'Open World Assumption'.  If a galaxy is not listed as being a  
member of a cluster, is this because it isn't a member of a cluster,  
it isn't _known_ to be, or it's known but isn't listed as such?  The  
'Open World Assumption' is the acknowledgement that not stating  
something doesn't mean it's not true.  This can often be handled one  
way or another, but it's a subtle gotcha so common that it's got its  
own name.

And finally: Doug:

> A real-world example of this from outside astronomy is open source
> software - why do those guys keep developing new software/standards,
> when so much more-or-less relevant stuff already exists?

Because it's more fun.  And because scratch-my-itch software  
processes don't have the same cost and time imperatives that we have  
as a professional software development community.



All the best,

Norman


-- 
------------------------------------------------------------
Norman Gray  :  http://nxg.me.uk
eurovotech.org  :  University of Leicester, UK



More information about the semantics mailing list