xml schema for skos

Thu Apr 9 01:54:19 PDT 2009

Hi Norman, Tony and others

Thanks for your comments.
I first tries out JRDF because it was the first google result for "java
rdf".
I switched to Jena (the first alternative on http://jrdf.sourceforge.net/)
to check out 
its SPARQL support, and stayed there because it worked there (and not in
JRDF).

As promised here an attempt to describe the use case derived from the SimDB
proposal.

SimDB is a protocol for querying a database containing metadata describing
simulations.
It is based on a data model for simulations that contains various entities
(classes) that
require an indication of their semantic meaning.
As an example, we have a class (TargetObjectType) describing the type of
object that is simulated. 
This class has a "label" attribute that should be given values from a
predefined and agreed upon list of terms identifying astronomical object
types. This should allow someone to describe that a simulation targeted a
"Galaxy Merger", or "Large Scale Structure of the Universe", supposing those
are terms contained in the list.

We propose to use SKOS vocabularies for such lists following the PR
http://www.ivoa.net/Documents/latest/Vocabularies.html. The data model
profile that we use allows modellers to associate the specific vocabulary
with the attribute definition.
Instances of TargetObjectType-s, for example contained in an XML document
describing a simulation, MUST (though Norman argued in the past this should
maybe be a SHOULD) use the preferred label from one of the skos:Concept-s in
the vocabulary (this is the proposal for now, the Semantics WG, in the
person of Norman, has agreed to participate in the further development of
SimDB).

This is (assumed to be) useful particular when querying the SimDB database
for potentially interesting simulations.
Querying is performed with ADQL, as SimDB defines itself to be a TAP service
with a fixed relational TAP_SCHEMA derived from the data model. In this
relational model there will be a table TargetObjectType with a column
"label", which again should have values from the same vocaulary.
For clients of the SimDB service, knowing that there is a common vocabulary
should help in writing queries for example to find simulations of "Galaxy
Cluster"-s. E.g. something like

select s.*
  from simulation s
  ,    targetobjecttype t
 where t.label='Galaxy Cluster'
   and s.simulationId=t.simulationid

Summarising
1. certain attributes in the data model can be declared to be
<<skosconcept>>-s (using a UML "stereotype")
2. for such attributes one can assign a URL (URI?) identifying the SKOS
RDF/XML vocabulary (using a UML "tag").
3. the possible (valid) values for such attributes are the preferred labels
of the skos:Concept-s in the vocabulary.

The SimDB prototype I mentioned must be able to retrieve the vocabulary
based on the URL and infer the concepts and their preferred labels. This can
be used for example to populate a list of values to be shown in a drop-down
when defining a simulation, or it can be used to validate an uploaded
simulaiton in XML form, or at least give a warning when a label does not
exists on a list.

This is my first experience with SKOS or RDF. And it seemed that what I
needed to do was to first find
all the subjects that have http://www.w3.org/1999/02/22-rdf-syntax-ns#type =
http://www.w3.org/2004/02/skos/core#Concept and for these find the object of
the http://www.w3.org/2004/02/skos/core#prefLabel predicate.

Doing that might require to steps when using the find(triple) methods on the
JRDF Graph class, whereas using a SPARQL query the information can be
obtained in one go. The following simple Java code did this:

      String skosFile =
"http://www.ivoa.net/Documents/WD/Semantics/Vocabularies-20083005/IVOAT/IVOA
T.rdf";
      Model model = ModelFactory.createDefaultModel();
      InputStream stream = new URL(skosFile).openStream();
      model.read(stream, "");

      String queryString = 
        " SELECT ?x ?y "+
        "  WHERE { ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2004/02/skos/core#Concept> ." +
        "          ?x <http://www.w3.org/2004/02/skos/core#prefLabel> ?y }"
;
      Query query = QueryFactory.create(queryString) ;
      QueryExecution qexec = QueryExecutionFactory.create(query, model) ;
      try {
        ResultSet results = qexec.execSelect() ;
        for ( ; results.hasNext() ; )
        {
          QuerySolution soln = results.nextSolution() ;
          RDFNode x = soln.get("x") ; 
          RDFNode y = soln.get("y") ; 
          if(x != null && y != null)
            System.out.println(x.toString()+ " : "+y.toString());
        }
      } finally { qexec.close() ; }

In Jena this executes pretty quickly (< 3sec), similar code in JRDF did not
return before my patience ran out. 
Jena requires indeed quite some more and larger jar files on the classpath
than JRDF.
I have not tried Sesame yet, but will try it out using the smaller jar file
Norman.
I guess I may not need SPARQL, but it is a nice way to mix and match the
SKOS information as suited for one's purposes.

Hope this was not too long and I would very much appreciate comments on our
proposed usage of SKOS vocabularies.

Btw, we may nneeed to define some of those , though hopefully we can use
existing ones.
To evaluate that requires more details on the data model that I will keep
for later.

Best regards

Gerard

> -----Original Message-----
> From: Norman Gray [mailto:norman at astro.gla.ac.uk] 
> Sent: Thursday, April 09, 2009 12:59 AM
> To: Gerard
> Cc: 'IVOA semantics'
> Subject: Re: xml schema for skos
> 
> 
> Gerard, hello.
> 
> On 2009 Apr 8, at 13:07, Gerard wrote:
> 
> > I am looking for a Java library which I did not found under 
> > librdf.org.
> > Its Java binding seems to have been deprecated it.
> > I have found jrdf (http://jrdf.sourceforge.net/), will try that out 
> > for now.
> > Afterwards I might vote for a simple (maybe even schema based) XML 
> > serialisation.
> 
> Having blithely said 'use an RDF parser', and then gone to 
> look for one, I realised that they're actually a little 
> thinner on the ground than I expected.
> 
> Jena (as Tony suggested) is a very good 
> RDF/OWL/reasoning/everything- else library, but is huge 
> (because of the .../OWL/reasoning/everything- else bit).  
> Sesame is another well-known one, and smaller, but still on 
> the heavyweight side.  Either of Jena or Sesame could claim 
> to be the javax.xml/Xalan/Xerxes of the RDF world.  From a 
> quick look, JRDF looks smaller than either of them, but still 
> does a lot more than you really need here, and as a result of 
> being much less well known than Jena and Sesame, is probably 
> less battle-tested.
> 
> So I had a closer look at Sesame, and have concluded that 
> this is probably the one you want to use.  The Jena 
> distribution is pretty monolithic, but Sesame is distributed 
> in a much more componentised form, and by cherrypicking the 
> right .jars from Sesame, you can assemble a pretty 
> lightweight parser for yourself.  Mmm: I hadn't previously 
> known that about Sesame!
> 
> I've attached a 20kB tarball which contains an example 
> program which parses RDF in two serialisations and then spits 
> out the result in
> (trivial) n-triples format.  The Sesame parsers are clearly 
> modelled on the SAX ones, with a parser which processes an 
> input stream, and calls a handler which assembles the model/graph.
> 
> The Makefile in the attachment shows which subset of Sesame 
> jars are required.  When the whole lot is assembled into a 
> single jar file it comes to 250kB, which could probably be 
> whittled down a lot with jar- optimisation.
> 
> Rick said:
> 
> > I hate to mention it (well, not really), but I long ago 
> said that part 
> > of the IVOA vocabulary proposal should have been 
> suggestions for the 
> > exact format of RDF/XML so that a formal IVOA RDF/XML schema could 
> > have been reality, making the practical parsing of 
> vocabularies much 
> > simpler (for all of the Roy William's and Alasdair Allen's of the 
> > world that don't want to add Turtle, n-triple, OWL, and arbitrarily 
> > complex RDF parsers to their list of needed tools).
> 
> Yes: as I noted, we discussed this, but the fallout from 
> specifying an RDF/XML profile is non-negligible.
> 
> The point is that you _don't_ need "Turtle, n-triple, OWL, 
> and arbitrarily complex RDF parsers" to use this stuff -- 
> just as you don't need XPath, XQuery, XProc, XMI, and 
> arbitrarily complex schema languages to parse bog-standard .xml files.
> 
> I'd be keen to hear your experiences with this, Gerard.  If 
> it's a big hassle to integrate with what you've currently 
> got, it'd certainly be possible to revisit the RDF/XML 
> profile idea, but my impression from this little experiment 
> is that 'use a subset of Sesame jars' is a recommendation I 
> can make with the honest expectation that it'll be useful and 
> feasible.
> 
> SKOS is simple enough that it's representable as XML without 
> violence, but XML isn't the answer to everything any more 
> than Fortran is.
> 
> Yours in all pragmatism,
> 
> Norman
> 
> 
> --
> Norman Gray  :  http://nxg.me.uk
> Dept Physics and Astronomy, University of Leicester
>