xml schema for skos

Thu Apr 9 04:43:14 PDT 2009

Thanks Norman 

> > SimDB is a protocol for querying a database containing metadata 
> > describing simulations.
> > It is based on a data model for simulations that contains various 
> > entities
> > (classes) that
> > require an indication of their semantic meaning.
> 
> That sounds excellent.  The only note of caution is that 
> vocabularies/ thesauri are really for _searching_ (broadly 
> considered); if you want meaning, you need an ontology [1].  
As you have seen from the use case, searching is the main reason for
bringing in the vocabularies.
And as we do not have any formal ontologies accepted yetin the VO (afaik),
and no doubt they are going to be much more complex, I am happy to stick
with SKOS for the moment.

> ...
> I'll draw attention to a couple of things:
> 
>    * In the case where an ADQL term is being generated by a 
> tool, or generated by a user using some menu-driven 
> interface, it might be feasible to include the URI in the 
> search query (perhaps with some namespacing for 
> compactness?).  One point of the URI, after all, is to have a 
> completely unambiguous name for the concept, which is free of 
> all whitespace/case/language complications.
> 
In dropdowns clearly URI-s can be hidden behind a label.
I am assuming that ADQL is written by users and not generated.
Then a simpler label is preferrable I think, as long as these are unique
within the vocabulary.
Is there such a constraint on preferred labels in SKOS vocabularies?
Re namespaces, I also have so far assumed that a vocabulary has a single
namespace for all its concepts.
Is that correct?

But I do think this needs further discussion for SimDB.

>    * a SKOS Concept has one prefLabel and multiple altLabel 
> properties per language, so you could potentially use any of 
> these in your search term.
> 
If we were to allow all of these I guess that leads to a burden on the
client user as know (s)he may have to include all possible labels in the
ADQL query.
If possible I would like to have a list of preferred labels only (as long as
unique).

But to be discussed further in SimDB.

>    * SKOS concepts also optionally have a 'notation' 
> property, such as 'c1.4.3.2' -- it's effectively another 
> label, but one which is in principle parseable.
> 
Same as previous.

> 
> Jena generally isn't hugely speedy (and RDF triplestores are 
> currently a very long way behind RDBMSs in performance 
> terms), but it's probably perfectly adequate for the 
> requirements you have.
> 
> SPARQL is generally a lot more flexible and maintainable as a 
> way of interrogating your data, and so much preferable to 
> grubbing around in the triples by hand.  If you don't require 
> a minimal classpath, then I don't think there's any downside 
> to using SPARQL.
> 
Most of the code may need to be run only once at build time, or on a server
at runtime
and I do indeed not anticipate major problems with Jena.
Does your Sesame recompilation have SPARQL functionality?

> > proposed usage of SKOS vocabularies.
> 
> > Btw, we may nneeed to define some of those , though 
> hopefully we can 
> > use existing ones.
> 
> RDF generally has no problems mixing things from different 
> namespaces/ schemas -- RDF's big strength is for this sort of 
> heterogeneous integration.  This means that having multiple 
> semi-standardised vocabularies should not be automatically 
> ruled out as a nightmare, as would be natural for someone 
> with a more schema-focused mindset.
> 
> Thus one could imagine one well-known vocabulary which has 
> terms A and B, and
> 
>      v1:A skos:narrower v1:B (B is a narrower term than A)
> 
> Then another vocabulary more specialised vocabulary might 
> have a term C, and
> 
>      v1:B skos:narrowMatch v2:C
> 
> (v2:C is a narrower tterm than v1:B, in a different 
> vocabulary; there are also exactMatch and broaderMatch).  If 
> you then told your SPARQL endpoint that skos:narrowMatch 
> should be regarded as like narrower, and that narrower should 
> be regarded as transitive, then a SPARQL query
> 
> select ?x ?y
> where {
>    ?x skos:narrower ?y
> }
> 
> would produce
> 
> ?x, ?y
> v1:A, v1:B
> v1:A, v2:C
> 
> ...even though v1:A skos:narrower v2:C was never explicitly asserted.
> 
First,  the proposal currently is to predefine the vocabulary that is to be
used, so at first we
would seem not to need this more complex behaviour. 

The reason for this is really that ..
second, the current proposal is to use TAP+ADQL for the actual querying.
So really SQL against a relational database and these do not support
all(any) of these features.
Users can OR to mimic for example the narrower/broader relations and I hope
at first that is sufficient.

Otherwise, but not in the first version I hope, I'd think it is possible to
support this if we add some tables storing the vocabularies and replicating
such relations in a Narrower table for example.
This has the danger of getting messy pretty quickly, especially if you
really want narrowerTransitive etc.

Alternatively, and from your point of view liley preferrably, we may try to
find a suitable RDF representation of (parts of) the data model and use an
appropriate (and fast) query engine.
Again this would go beyond the current proposal and I hope will not be
required yet.

Thanks

Gerard