xml schema for skos

Thu Apr 9 03:47:20 PDT 2009

Gerard, hello.

On 2009 Apr 9, at 09:54, Gerard wrote:

> SimDB is a protocol for querying a database containing metadata  
> describing
> simulations.
> It is based on a data model for simulations that contains various  
> entities
> (classes) that
> require an indication of their semantic meaning.

That sounds excellent.  The only note of caution is that vocabularies/ 
thesauri are really for _searching_ (broadly considered); if you want  
meaning, you need an ontology [1].  Thesauri are roughly at the  
boundary between the two, but still on the semi-formal side, and  
therefore easier and cheaper to create and work with, while still  
providing some 'meaning'.  I think that vocabularies are the right  
solution for the SimDB problem, and I mention this issue only to note  
that a future desire for more exotic things might require some re- 
engineering.

> Querying is performed with ADQL, as SimDB defines itself to be a TAP  
> service
> with a fixed relational TAP_SCHEMA derived from the data model. In  
> this
> relational model there will be a table TargetObjectType with a column
> "label", which again should have values from the same vocaulary.
> For clients of the SimDB service, knowing that there is a common  
> vocabulary
> should help in writing queries for example to find simulations of  
> "Galaxy
> Cluster"-s. E.g. something like
>
> select s.*
>  from simulation s
>  ,    targetobjecttype t
> where t.label='Galaxy Cluster'
>   and s.simulationId=t.simulationid

I'll draw attention to a couple of things:

   * In the case where an ADQL term is being generated by a tool, or  
generated by a user using some menu-driven interface, it might be  
feasible to include the URI in the search query (perhaps with some  
namespacing for compactness?).  One point of the URI, after all, is to  
have a completely unambiguous name for the concept, which is free of  
all whitespace/case/language complications.

   * a SKOS Concept has one prefLabel and multiple altLabel properties  
per language, so you could potentially use any of these in your search  
term.

   * SKOS concepts also optionally have a 'notation' property, such as  
'c1.4.3.2' -- it's effectively another label, but one which is in  
principle parseable.

> This is my first experience with SKOS or RDF. And it seemed that  
> what I
> needed to do was to first find
> all the subjects that have http://www.w3.org/1999/02/22-rdf-syntax-ns#type 
>  =
> http://www.w3.org/2004/02/skos/core#Concept and for these find the  
> object of
> the http://www.w3.org/2004/02/skos/core#prefLabel predicate.
>
> Doing that might require to steps when using the find(triple)  
> methods on the
> JRDF Graph class, whereas using a SPARQL query the information can be
> obtained in one go.

Exactly.

>      String queryString =
>        " SELECT ?x ?y "+
>        "  WHERE { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.w3.org/2004/02/skos/core#Concept> ." +
>        "          ?x <http://www.w3.org/2004/02/skos/ 
> core#prefLabel> ?y }"
> ;

That can be slightly compacted to:

@prefix skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?x ?y
WHERE {
   ?x a skos:Concept.
   ?x skos:prefLabel ?y
}

(though brevity isn't your goal here, I know).

> In Jena this executes pretty quickly (< 3sec), similar code in JRDF  
> did not
> return before my patience ran out.
> Jena requires indeed quite some more and larger jar files on the  
> classpath
> than JRDF.
> I have not tried Sesame yet, but will try it out using the smaller  
> jar file
> Norman.
> I guess I may not need SPARQL, but it is a nice way to mix and match  
> the
> SKOS information as suited for one's purposes.

Jena generally isn't hugely speedy (and RDF triplestores are currently  
a very long way behind RDBMSs in performance terms), but it's probably  
perfectly adequate for the requirements you have.

SPARQL is generally a lot more flexible and maintainable as a way of  
interrogating your data, and so much preferable to grubbing around in  
the triples by hand.  If you don't require a minimal classpath, then I  
don't think there's any downside to using SPARQL.

> proposed usage of SKOS vocabularies.

> Btw, we may nneeed to define some of those , though hopefully we can  
> use
> existing ones.

RDF generally has no problems mixing things from different namespaces/ 
schemas -- RDF's big strength is for this sort of heterogeneous  
integration.  This means that having multiple semi-standardised  
vocabularies should not be automatically ruled out as a nightmare, as  
would be natural for someone with a more schema-focused mindset.

Thus one could imagine one well-known vocabulary which has terms A and  
B, and

     v1:A skos:narrower v1:B (B is a narrower term than A)

Then another vocabulary more specialised vocabulary might have a term  
C, and

     v1:B skos:narrowMatch v2:C

(v2:C is a narrower tterm than v1:B, in a different vocabulary; there  
are also exactMatch and broaderMatch).  If you then told your SPARQL  
endpoint that skos:narrowMatch should be regarded as like narrower,  
and that narrower should be regarded as transitive, then a SPARQL query

select ?x ?y
where {
   ?x skos:narrower ?y
}

would produce

?x, ?y
v1:A, v1:B
v1:A, v2:C

...even though v1:A skos:narrower v2:C was never explicitly asserted.

It's looking good!

All the best,

Norman

[1] There's more than one definition of where the boundary is between  
thesauri (vocabularies with structure) and ontologies, which implies  
that the boundary is fluid.  However a good and common definition is  
that ontologies have a hierarchy formed from formal IsA relations.   
Thus 'car wheel' hasBroaderTerm 'car' is OK in a thesaurus, but 'car  
wheel' IsA 'car' is an error in an ontology.

-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester