Vocabularies validator

Norman Gray norman at astro.gla.ac.uk
Mon Jun 22 05:09:26 PDT 2009


Greetings, all.

I've written a validator to check SKOS vocabularies for conformance  
with the Vocabularies PR document.

You can find that at <http://code.google.com/p/volute/source/browse/#svn/trunk/projects/vocabularies/src/code/validator 
 >
or for download at <http://volute.googlecode.com/files/VocabularyValidator-0.1.jar 
 >, and you can apply it to your developing vocabularies for your  
edification.  If anyone disagrees with the validators assessments,  
please do let me know.

Slightly embarassingly, but not entirely surprisingly, this exercise  
uncovered a couple of ambiguities in the document, and some parts of  
the published vocabularies which were not conformant.  A summary of  
the changes to the document are below -- all of these are fine detail,  
and I do not believe any are significant enough to interrupt the RFC  
process.

The consequent line-by-line changes to the document can be found at <http://code.google.com/p/volute/source/browse/trunk/projects/vocabularies/doc/vocabularies.xml 
 > (this is revision 1097), and the current formatted document is at <http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/ 
 >.

This has been a useful exercise.  I warmly encourage other working  
groups to go through the process of developing a validator for a  
proposed standard.

Best wishes,

Norman







What does this validator check?
-------------------------------

The validator aims to check as many of the document's normative
remarks as possible.  These primarily appear in section 3,
[#publishing].

   * Section 3.1.1 [#req-derefns].  Checks the 303-dance

   * Section 3.1.2 [#req-availability].  Can't be tested.

   * Section 3.1.3 [#req-distformat].  Tested as part of Sect 3.1.1
     testing.

   * Section 3.1.4 [#versioning].  Can't be tested, realistically.

   * Section 3.1.5 [#req-labels].  Tested.  See the notes below about
     extended requirements -- this checks that all the labels have a
     language tag, that at least one of them is @en, and that there is  
a prefLabel at en.

   * Section 3.1.6 [#req-sourcefiles].  Negative requirement, not  
tested.

   * Section 3.2 [#practices].  Checks practices-id (concept regexp),  
practices-lang (require
     language tag), practices-relations (reciprocated relationships),  
practices-singlescheme
     (single ConceptScheme).

     Practices #practices-readable, #practices-labelnumber, #practices- 
mappings, #practices-existing
     cannot be checked mechanically.

     Practices #practices-conceptmd and #practices-topconcepts, could  
be checked
     mechanically, but as the spec notes, this practice could quite
     reasonably be violated, and we can't distinguish this.

Some problems found
-------------------

Working through the document with the validator in mind, I found a
couple of problems with it.

   * Some (sub)sections with requirements didn't have IDs; added.

   * Section 3.1.1 (#req-derefns) says that dereferencing the namespace
     SHOULD provide RDF, but 3.1.3 (#req-distformat) says it MUST.  We
     should go with the latter.

   * MIME types for Turtle.  Only application/rdf+xml is registered,
     but http://www.w3.org/TeamSubmission/turtle/ anticipates
     text/turtle, and says that application/x-turtle should be accepted
     pre-registration.

   * Section 3.1.5 (#req-labels) doesn't require that all labels have a
     language tag; we should, and require that at least one of them be  
@en.

   * ...but that conflicts with Section 3.2 item 5, which only says
     that they SHOULD have a language tag.  They now MUST have a
     language tag.

   * I'd never required that there be only one ConceptScheme in a
     vocabulary.  This is now a MUST.

   * Emphasise that people should use the DC Terms namespace, rather
     than the older DC Elements namespace, and that dct:creator is an
     object property.

   * There seems no good reason to forbid [0-9] as characters at the
     start of a concept name, so I've relaxed that




-- 
Norman Gray  :  http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester



More information about the semantics mailing list