Vocabularies: next steps

Norman Gray norman at astro.gla.ac.uk
Wed Nov 21 02:18:30 PST 2007


Greetings, all.

I've been somewhat detached from the vocabularies discussion for the  
last couple of weeks (hassle on another project).  I'd quite like to  
move this forward, though,with a view to getting some initial draft  
out by the end of the calendar year.



The outstanding issues appear to be:

1. The format of the concept labels (case and character set)

2. The grammatical number of the concept names (singular or plural)

3. The number of top concepts in the IAU thesaurus

4. The number of vocabularies we intend to produce (in particular  
whether we produce a pair of `IAU' thesauri, including a corrected  
and updated one, and which UCD vocabulary we use), and which  
interrelationships we plan to publish

5. Which namespace we use

6. The WD which documents this

7. How we manage the development and release of the vocabularies

*. Any others?

==========

My responses to these are below, but I think the most important one  
is [7]


7. How we manage the development and release of the vocabularies

We currently seem to have at least three sets of vocabulary SKOS  
files, namely Rick's, Alasdair's and Doug's (my offerings have been  
absorbed into Alasdair's), and we have an outline WD in my repository  
-- we should perhaps start to share this stuff, and look towards  
making at least a first release of all this before the end of this  
calendar year.

We could look at some distributed VC system such as Git, or less fun  
but more practically share it via SourceForge or Google Code (ie,  
repository plus issue tracking).  If you all are agreeable, I'll set  
up a project in one of the latter and start importing.  Shout now.

That project's repository would be the working copy of the various  
vocabularies, making releases to  the IVOA standards process, which  
would be the formal master.  There is an issue about the format of  
the master source -- namely SKOS as a master, some other private  
format which is reprocessed into SKOS, or something like the Lexicon  
source file used for the IAU thesaurus; there have been various views  
on that which we can resolve fairly shortly as a technicality.

Timescales: How about this?

     30 Nov: shared project set up and populated with at least WD text
     7 Dec: some SKOS-generating code in the repository, and a set of  
technical issues/disagreements identified
     19 Dec: loose agreement on technical issues, and a first version  
of a WD document with normative SKOS appendices released to the IVOA  
documents process from the shared repository

---

1. The format of the concept labels (case and character set)

I'd vote for Rick's option 2, bareboneslowercasenames, but we can  
possibly argue about that more constructively during step 2 above.

---

2. The grammatical number of the concept names (singular or plural)

It seems that english-language thesauri `traditionally' have concepts  
labelled with plurals, whereas French and German ones typically have  
concepts labelled with singular terms.  That's according to  
ISO-5964.  I don't think it's a big deal, but the examples in the  
SKOS docs are indeed either abstract nouns or plurals.

---

3. The number of top concepts in the IAU thesaurus

The plenitude of top concepts does seem odd, I agree, but I don't  
think it's necessarily a problem.  There's nothing really magical  
about a tree, and this is supposed to be a controlled vocabulary  
rather than a systemisation-of-all-knowledge

---

4. The number of vocabularies we intend to produce (in particular  
whether we produce a pair of `IAU' thesauri, including a corrected  
and updated one, and which UCD vocabulary we use), and which  
interrelationships we plan to publish

I'd suggest one each for A&A, AOIM, UCD, IAU original and possibly  
IAU updated.  That is, we publish a SKOSified version of the  
_original_ IAU thesaurus, with all its spelling mistakes and  
outdatedness, and don't touch that thereafter.  Subsequently or  
simultaneously we could and should publish a preened and updated one,  
with whatever IAU imprimaturs Rob or others can bring upon us.

The interrelationships are vital, I think, but they should not be  
included in the SKOS files for the various vocabularies, but as  
separate items in the standard document set.  They can be produced  
and maintained on a separate timescale.

---

5. Which namespace we use

The options I can see are

   * A dedicated namespace like ns.ivoa.net/thesauri or www.ivoa.net/ 
thesauri
   * The namespace implied by the (current revision of) the WD, thus  
www.ivoa.net/Documents/Vocabularies-2007-12-xx/ is the document,  
and .../Vocabularies-2007-12-xx/{IAU-legacy,IAU,UCD,AandA,AOIM}# are  
the namespaces of the various vocabularies published with it.

I think I prefer the second (though I wouldn't go to the stake about  
it).  For what it's worth, I think this is compatible with the W3C's  
practice.

---

6. The WD which documents this

My fault: I'm late producing this, after having said at the October  
VO-TECH meeting that it would take very little time.

I anticipate a fairly thin document, documenting the decision to use  
SKOS and describing what's available, along with some relatively  
small quantity of high-level rationale for the decisions taken, and  
acting as a placeholder and namespace for the normative SKOS files  
attached to it.  Authorship would be whoever contributes text or SKOS  
stuff.



How does all this sound?

All the best,

Norman


-- 
------------------------------------------------------------
Norman Gray  :  http://nxg.me.uk
eurovotech.org  :  University of Leicester, UK




More information about the semantics mailing list