Vocabularies: next steps

Ed Shaya eshaya at umd.edu
Wed Nov 21 09:40:40 PST 2007

Norman Gray wrote:
> Greetings, all.
> I've been somewhat detached from the vocabularies discussion for the 
> last couple of weeks (hassle on another project).  I'd quite like to 
> move this forward, though,with a view to getting some initial draft 
> out by the end of the calendar year.
> The outstanding issues appear to be:
> 1. The format of the concept labels (case and character set)
We need to decide just what this vocabulary will be used for.  If the 
terms are  to be used in ontologies, N3 statements,  UCDs, etc, then 
they need to be readable.  The set [a-z] only does not make a readable 
set.  The argument that these are just the Ids and no human will ever 
need to see them, is only true if the SKOS is not actually used.
> 2. The grammatical number of the concept names (singular or plural)
We surely want singular.  There is a skos:definition.  Do we define the 
plural term or the singular term there?  Galaxies - "Systems with many 
stars" or "A system with stars."  We call SKOS a thesaurus, but it is 
also a dictionary and all dictionaries are in the singular.
I don't think anyone wants to see "LocalGroup a  groupsOfGalaxies".
Plus there is already great confusion in the IVOAT because of this 
issue: We have mass (not masses), but massRatios (not massRatio).  We 
have structure but not structures.  Taxonomy but not taxonomies.  If 
everything were singular it would just be easier.

> 3. The number of top concepts in the IAU thesaurus
This is a purely arbitrary number.  You (not me) need to decide what a 
topConcept is.  Is it truly those things with no broader term or is it 
the set of most useful and yet fairly broad terms that one would see in 
the index of a book.  Creating a good index for a book is an art not a 
science.  If it is only the broadest terms, then consider that we 
probably should have the following terms: astronomy, physics, chemistry, 
statistics, math, units, bibliography and instrumentation.  Nearly every 
term we have is a narrower term to one of these.  So these would be the 
8 topConcepts.  In ontology these would probably be the top namespaces.  
This is fairly close to what was done in UCDs.
> 4. The number of vocabularies we intend to produce (in particular 
> whether we produce a pair of `IAU' thesauri, including a corrected and 
> updated one, and which UCD vocabulary we use), and which 
> interrelationships we plan to publish
> 5. Which namespace we use
> 6. The WD which documents this
Aren't we talking about a Note?
Does this include a mechanism for compound statements?   I haven't  
heard the discussion on this (other than my suggestion to use N3).
> 7. How we manage the development and release of the vocabularies
> *. Any others?
> ==========
> My responses to these are below, but I think the most important one is 
> [7]
> 7. How we manage the development and release of the vocabularies
> We currently seem to have at least three sets of vocabulary SKOS 
> files, namely Rick's, Alasdair's and Doug's (my offerings have been 
> absorbed into Alasdair's), and we have an outline WD in my repository 
> -- we should perhaps start to share this stuff, and look towards 
> making at least a first release of all this before the end of this 
> calendar year.
> We could look at some distributed VC system such as Git, or less fun 
> but more practically share it via SourceForge or Google Code (ie, 
> repository plus issue tracking).  If you all are agreeable, I'll set 
> up a project in one of the latter and start importing.  Shout now.
> That project's repository would be the working copy of the various 
> vocabularies, making releases to  the IVOA standards process, which 
> would be the formal master.  There is an issue about the format of the 
> master source -- namely SKOS as a master, some other private format 
> which is reprocessed into SKOS, or something like the Lexicon source 
> file used for the IAU thesaurus; there have been various views on that 
> which we can resolve fairly shortly as a technicality.
> Timescales: How about this?
>     30 Nov: shared project set up and populated with at least WD text
>     7 Dec: some SKOS-generating code in the repository, and a set of 
> technical issues/disagreements identified
>     19 Dec: loose agreement on technical issues, and a first version 
> of a WD document with normative SKOS appendices released to the IVOA 
> documents process from the shared repository
> ---
> 1. The format of the concept labels (case and character set)
> I'd vote for Rick's option 2, bareboneslowercasenames, but we can 
> possibly argue about that more constructively during step 2 above.
> ---
> 2. The grammatical number of the concept names (singular or plural)
> It seems that english-language thesauri `traditionally' have concepts 
> labelled with plurals, whereas French and German ones typically have 
> concepts labelled with singular terms.  That's according to ISO-5964.  
> I don't think it's a big deal, but the examples in the SKOS docs are 
> indeed either abstract nouns or plurals.
> ---
> 3. The number of top concepts in the IAU thesaurus
> The plenitude of top concepts does seem odd, I agree, but I don't 
> think it's necessarily a problem.  There's nothing really magical 
> about a tree, and this is supposed to be a controlled vocabulary 
> rather than a systemisation-of-all-knowledge
> ---
> 4. The number of vocabularies we intend to produce (in particular 
> whether we produce a pair of `IAU' thesauri, including a corrected and 
> updated one, and which UCD vocabulary we use), and which 
> interrelationships we plan to publish
> I'd suggest one each for A&A, AOIM, UCD, IAU original and possibly IAU 
> updated.  That is, we publish a SKOSified version of the _original_ 
> IAU thesaurus, with all its spelling mistakes and outdatedness, and 
> don't touch that thereafter.  Subsequently or simultaneously we could 
> and should publish a preened and updated one, with whatever IAU 
> imprimaturs Rob or others can bring upon us.
I don't think the original IAU thesaurus saw any usage.  (Does anyone 
know of any?)  So, what would be the point of a mapping to it?
> The interrelationships are vital, I think, but they should not be 
> included in the SKOS files for the various vocabularies, but as 
> separate items in the standard document set.  They can be produced and 
> maintained on a separate timescale.
> ---
> 5. Which namespace we use
> The options I can see are
>   * A dedicated namespace like ns.ivoa.net/thesauri or 
> www.ivoa.net/thesauri
>   * The namespace implied by the (current revision of) the WD, thus 
> www.ivoa.net/Documents/Vocabularies-2007-12-xx/ is the document, and 
> .../Vocabularies-2007-12-xx/{IAU-legacy,IAU,UCD,AandA,AOIM}# are the 
> namespaces of the various vocabularies published with it.
> I think I prefer the second (though I wouldn't go to the stake about 
> it).  For what it's worth, I think this is compatible with the W3C's 
> practice.
> ---
> 6. The WD which documents this
> My fault: I'm late producing this, after having said at the October 
> VO-TECH meeting that it would take very little time.
> I anticipate a fairly thin document, documenting the decision to use 
> SKOS and describing what's available, along with some relatively small 
> quantity of high-level rationale for the decisions taken, and acting 
> as a placeholder and namespace for the normative SKOS files attached 
> to it.  Authorship would be whoever contributes text or SKOS stuff.
> How does all this sound?
> All the best,
> Norman

