Vocabularies: next steps
Norman Gray
norman at astro.gla.ac.uk
Wed Nov 21 02:18:30 PST 2007
Greetings, all.
I've been somewhat detached from the vocabularies discussion for the
last couple of weeks (hassle on another project). I'd quite like to
move this forward, though,with a view to getting some initial draft
out by the end of the calendar year.
The outstanding issues appear to be:
1. The format of the concept labels (case and character set)
2. The grammatical number of the concept names (singular or plural)
3. The number of top concepts in the IAU thesaurus
4. The number of vocabularies we intend to produce (in particular
whether we produce a pair of `IAU' thesauri, including a corrected
and updated one, and which UCD vocabulary we use), and which
interrelationships we plan to publish
5. Which namespace we use
6. The WD which documents this
7. How we manage the development and release of the vocabularies
*. Any others?
==========
My responses to these are below, but I think the most important one
is [7]
7. How we manage the development and release of the vocabularies
We currently seem to have at least three sets of vocabulary SKOS
files, namely Rick's, Alasdair's and Doug's (my offerings have been
absorbed into Alasdair's), and we have an outline WD in my repository
-- we should perhaps start to share this stuff, and look towards
making at least a first release of all this before the end of this
calendar year.
We could look at some distributed VC system such as Git, or less fun
but more practically share it via SourceForge or Google Code (ie,
repository plus issue tracking). If you all are agreeable, I'll set
up a project in one of the latter and start importing. Shout now.
That project's repository would be the working copy of the various
vocabularies, making releases to the IVOA standards process, which
would be the formal master. There is an issue about the format of
the master source -- namely SKOS as a master, some other private
format which is reprocessed into SKOS, or something like the Lexicon
source file used for the IAU thesaurus; there have been various views
on that which we can resolve fairly shortly as a technicality.
Timescales: How about this?
30 Nov: shared project set up and populated with at least WD text
7 Dec: some SKOS-generating code in the repository, and a set of
technical issues/disagreements identified
19 Dec: loose agreement on technical issues, and a first version
of a WD document with normative SKOS appendices released to the IVOA
documents process from the shared repository
---
1. The format of the concept labels (case and character set)
I'd vote for Rick's option 2, bareboneslowercasenames, but we can
possibly argue about that more constructively during step 2 above.
---
2. The grammatical number of the concept names (singular or plural)
It seems that english-language thesauri `traditionally' have concepts
labelled with plurals, whereas French and German ones typically have
concepts labelled with singular terms. That's according to
ISO-5964. I don't think it's a big deal, but the examples in the
SKOS docs are indeed either abstract nouns or plurals.
---
3. The number of top concepts in the IAU thesaurus
The plenitude of top concepts does seem odd, I agree, but I don't
think it's necessarily a problem. There's nothing really magical
about a tree, and this is supposed to be a controlled vocabulary
rather than a systemisation-of-all-knowledge
---
4. The number of vocabularies we intend to produce (in particular
whether we produce a pair of `IAU' thesauri, including a corrected
and updated one, and which UCD vocabulary we use), and which
interrelationships we plan to publish
I'd suggest one each for A&A, AOIM, UCD, IAU original and possibly
IAU updated. That is, we publish a SKOSified version of the
_original_ IAU thesaurus, with all its spelling mistakes and
outdatedness, and don't touch that thereafter. Subsequently or
simultaneously we could and should publish a preened and updated one,
with whatever IAU imprimaturs Rob or others can bring upon us.
The interrelationships are vital, I think, but they should not be
included in the SKOS files for the various vocabularies, but as
separate items in the standard document set. They can be produced
and maintained on a separate timescale.
---
5. Which namespace we use
The options I can see are
* A dedicated namespace like ns.ivoa.net/thesauri or www.ivoa.net/
thesauri
* The namespace implied by the (current revision of) the WD, thus
www.ivoa.net/Documents/Vocabularies-2007-12-xx/ is the document,
and .../Vocabularies-2007-12-xx/{IAU-legacy,IAU,UCD,AandA,AOIM}# are
the namespaces of the various vocabularies published with it.
I think I prefer the second (though I wouldn't go to the stake about
it). For what it's worth, I think this is compatible with the W3C's
practice.
---
6. The WD which documents this
My fault: I'm late producing this, after having said at the October
VO-TECH meeting that it would take very little time.
I anticipate a fairly thin document, documenting the decision to use
SKOS and describing what's available, along with some relatively
small quantity of high-level rationale for the decisions taken, and
acting as a placeholder and namespace for the normative SKOS files
attached to it. Authorship would be whoever contributes text or SKOS
stuff.
How does all this sound?
All the best,
Norman
--
------------------------------------------------------------
Norman Gray : http://nxg.me.uk
eurovotech.org : University of Leicester, UK
More information about the semantics
mailing list