xml schema for skos
Norman Gray
norman at astro.gla.ac.uk
Thu Apr 9 03:47:20 PDT 2009
Gerard, hello.
On 2009 Apr 9, at 09:54, Gerard wrote:
> SimDB is a protocol for querying a database containing metadata
> describing
> simulations.
> It is based on a data model for simulations that contains various
> entities
> (classes) that
> require an indication of their semantic meaning.
That sounds excellent. The only note of caution is that vocabularies/
thesauri are really for _searching_ (broadly considered); if you want
meaning, you need an ontology [1]. Thesauri are roughly at the
boundary between the two, but still on the semi-formal side, and
therefore easier and cheaper to create and work with, while still
providing some 'meaning'. I think that vocabularies are the right
solution for the SimDB problem, and I mention this issue only to note
that a future desire for more exotic things might require some re-
engineering.
> Querying is performed with ADQL, as SimDB defines itself to be a TAP
> service
> with a fixed relational TAP_SCHEMA derived from the data model. In
> this
> relational model there will be a table TargetObjectType with a column
> "label", which again should have values from the same vocaulary.
> For clients of the SimDB service, knowing that there is a common
> vocabulary
> should help in writing queries for example to find simulations of
> "Galaxy
> Cluster"-s. E.g. something like
>
> select s.*
> from simulation s
> , targetobjecttype t
> where t.label='Galaxy Cluster'
> and s.simulationId=t.simulationid
I'll draw attention to a couple of things:
* In the case where an ADQL term is being generated by a tool, or
generated by a user using some menu-driven interface, it might be
feasible to include the URI in the search query (perhaps with some
namespacing for compactness?). One point of the URI, after all, is to
have a completely unambiguous name for the concept, which is free of
all whitespace/case/language complications.
* a SKOS Concept has one prefLabel and multiple altLabel properties
per language, so you could potentially use any of these in your search
term.
* SKOS concepts also optionally have a 'notation' property, such as
'c1.4.3.2' -- it's effectively another label, but one which is in
principle parseable.
> This is my first experience with SKOS or RDF. And it seemed that
> what I
> needed to do was to first find
> all the subjects that have http://www.w3.org/1999/02/22-rdf-syntax-ns#type
> =
> http://www.w3.org/2004/02/skos/core#Concept and for these find the
> object of
> the http://www.w3.org/2004/02/skos/core#prefLabel predicate.
>
> Doing that might require to steps when using the find(triple)
> methods on the
> JRDF Graph class, whereas using a SPARQL query the information can be
> obtained in one go.
Exactly.
> String queryString =
> " SELECT ?x ?y "+
> " WHERE { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.w3.org/2004/02/skos/core#Concept> ." +
> " ?x <http://www.w3.org/2004/02/skos/
> core#prefLabel> ?y }"
> ;
That can be slightly compacted to:
@prefix skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?x ?y
WHERE {
?x a skos:Concept.
?x skos:prefLabel ?y
}
(though brevity isn't your goal here, I know).
> In Jena this executes pretty quickly (< 3sec), similar code in JRDF
> did not
> return before my patience ran out.
> Jena requires indeed quite some more and larger jar files on the
> classpath
> than JRDF.
> I have not tried Sesame yet, but will try it out using the smaller
> jar file
> Norman.
> I guess I may not need SPARQL, but it is a nice way to mix and match
> the
> SKOS information as suited for one's purposes.
Jena generally isn't hugely speedy (and RDF triplestores are currently
a very long way behind RDBMSs in performance terms), but it's probably
perfectly adequate for the requirements you have.
SPARQL is generally a lot more flexible and maintainable as a way of
interrogating your data, and so much preferable to grubbing around in
the triples by hand. If you don't require a minimal classpath, then I
don't think there's any downside to using SPARQL.
> proposed usage of SKOS vocabularies.
> Btw, we may nneeed to define some of those , though hopefully we can
> use
> existing ones.
RDF generally has no problems mixing things from different namespaces/
schemas -- RDF's big strength is for this sort of heterogeneous
integration. This means that having multiple semi-standardised
vocabularies should not be automatically ruled out as a nightmare, as
would be natural for someone with a more schema-focused mindset.
Thus one could imagine one well-known vocabulary which has terms A and
B, and
v1:A skos:narrower v1:B (B is a narrower term than A)
Then another vocabulary more specialised vocabulary might have a term
C, and
v1:B skos:narrowMatch v2:C
(v2:C is a narrower tterm than v1:B, in a different vocabulary; there
are also exactMatch and broaderMatch). If you then told your SPARQL
endpoint that skos:narrowMatch should be regarded as like narrower,
and that narrower should be regarded as transitive, then a SPARQL query
select ?x ?y
where {
?x skos:narrower ?y
}
would produce
?x, ?y
v1:A, v1:B
v1:A, v2:C
...even though v1:A skos:narrower v2:C was never explicitly asserted.
It's looking good!
All the best,
Norman
[1] There's more than one definition of where the boundary is between
thesauri (vocabularies with structure) and ontologies, which implies
that the boundary is fluid. However a good and common definition is
that ontologies have a hierarchy formed from formal IsA relations.
Thus 'car wheel' hasBroaderTerm 'car' is OK in a thesaurus, but 'car
wheel' IsA 'car' is an error in an ontology.
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester
More information about the semantics
mailing list