Vocabularies validator
Norman Gray
norman at astro.gla.ac.uk
Mon Jun 22 05:09:26 PDT 2009
Greetings, all.
I've written a validator to check SKOS vocabularies for conformance
with the Vocabularies PR document.
You can find that at <http://code.google.com/p/volute/source/browse/#svn/trunk/projects/vocabularies/src/code/validator
>
or for download at <http://volute.googlecode.com/files/VocabularyValidator-0.1.jar
>, and you can apply it to your developing vocabularies for your
edification. If anyone disagrees with the validators assessments,
please do let me know.
Slightly embarassingly, but not entirely surprisingly, this exercise
uncovered a couple of ambiguities in the document, and some parts of
the published vocabularies which were not conformant. A summary of
the changes to the document are below -- all of these are fine detail,
and I do not believe any are significant enough to interrupt the RFC
process.
The consequent line-by-line changes to the document can be found at <http://code.google.com/p/volute/source/browse/trunk/projects/vocabularies/doc/vocabularies.xml
> (this is revision 1097), and the current formatted document is at <http://www.astro.gla.ac.uk/users/norman/ivoa/vocabularies/
>.
This has been a useful exercise. I warmly encourage other working
groups to go through the process of developing a validator for a
proposed standard.
Best wishes,
Norman
What does this validator check?
-------------------------------
The validator aims to check as many of the document's normative
remarks as possible. These primarily appear in section 3,
[#publishing].
* Section 3.1.1 [#req-derefns]. Checks the 303-dance
* Section 3.1.2 [#req-availability]. Can't be tested.
* Section 3.1.3 [#req-distformat]. Tested as part of Sect 3.1.1
testing.
* Section 3.1.4 [#versioning]. Can't be tested, realistically.
* Section 3.1.5 [#req-labels]. Tested. See the notes below about
extended requirements -- this checks that all the labels have a
language tag, that at least one of them is @en, and that there is
a prefLabel at en.
* Section 3.1.6 [#req-sourcefiles]. Negative requirement, not
tested.
* Section 3.2 [#practices]. Checks practices-id (concept regexp),
practices-lang (require
language tag), practices-relations (reciprocated relationships),
practices-singlescheme
(single ConceptScheme).
Practices #practices-readable, #practices-labelnumber, #practices-
mappings, #practices-existing
cannot be checked mechanically.
Practices #practices-conceptmd and #practices-topconcepts, could
be checked
mechanically, but as the spec notes, this practice could quite
reasonably be violated, and we can't distinguish this.
Some problems found
-------------------
Working through the document with the validator in mind, I found a
couple of problems with it.
* Some (sub)sections with requirements didn't have IDs; added.
* Section 3.1.1 (#req-derefns) says that dereferencing the namespace
SHOULD provide RDF, but 3.1.3 (#req-distformat) says it MUST. We
should go with the latter.
* MIME types for Turtle. Only application/rdf+xml is registered,
but http://www.w3.org/TeamSubmission/turtle/ anticipates
text/turtle, and says that application/x-turtle should be accepted
pre-registration.
* Section 3.1.5 (#req-labels) doesn't require that all labels have a
language tag; we should, and require that at least one of them be
@en.
* ...but that conflicts with Section 3.2 item 5, which only says
that they SHOULD have a language tag. They now MUST have a
language tag.
* I'd never required that there be only one ConceptScheme in a
vocabulary. This is now a MUST.
* Emphasise that people should use the DC Terms namespace, rather
than the older DC Elements namespace, and that dct:creator is an
object property.
* There seems no good reason to forbid [0-9] as characters at the
start of a concept name, so I've relaxed that
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester
More information about the semantics
mailing list