Vocabularies: next steps

Brian Thomas thomas at astro.umd.edu
Tue Nov 27 07:31:47 PST 2007


On Tuesday 27 November 2007, Frederic V. Hessman wrote:
> >> 	Set: a-Z, 0-9
> > You're quite right.  I meant the concept URI: the concept fragment
> > should I believe/agree, be drawn from [a-z0-9], though I wouldn't
> > push very hard against [a-zA-Z0-9].  The prefLabel and altLabel
> > fields should be Unicode.
> >
> > [AG] I would probably argue for [a-zA-Z0-9]
> 
> "a-Z,0-9" was meant to mean exactly this.  By now, I think we can all  
> agree on this.

Its a narrowing of the spec, which makes me uncomfortable, but I can 
agree to it, and am happy to let the matter rest.

> 
> >>> The number of top concepts in the IAU thesaurus
> >> Huh?  The IAU thesaurus is the IAU thesaurus.  If "top concepts"
> >> are defined either as 1) not having a BT or 2) having a NT, then
> >> the number is already fixed.  Basta.
> > [AG] I still feel that for the IAU93 Thesaurus we should adopt the  
> > list
> > of tokens given in the web version. However, I agree with Norman that
> > the top concepts are there to aid the navigation and for no other
> > reason. When it comes to the IVOAT, I would think that the top  
> > concepts
> > are those that do not have a BT.
> 
> For simplicity and consistency, I would argue that we define "top  
> concepts" as those not having a BT.
> 
> This should be part of the IVOA vocabulary guidelines, e.g. (here's  
> my first cut)
> 
> 1. A single SKOS document defines the vocabulary and must be  
> publically available at some URI, preferably
> 	at the central IVOA vocabulary repository http://www.ivoa.net/?????  
> at least as a copy.

I have heard that it is desirable to  make URI *not* match up with an actual 
location (e.g. its a functioning URL). However, a perusal of a RDF best 
practices doc at W3C: http://www.w3.org/TR/swbp-vocab-pub/

clarifies :

"The URI namespace you choose for your vocabulary should be a Web address (a URI) 
to which you have write access. Others who use your vocabulary will expect to be able 
to dereference both the vocabulary URI itself as well as the URIs of properties and 
classes defined by your vocabulary."

The remaining issue is whether or not to choose a 'hash' or 'slash' namespace. The
former is for smaller vocabularies, while the later for larger ones, where terms are
being added frequently and/or the whole vocabulary is not to be retrieved at one time.

> 
> 2. A concept token has the form
> 
> 			{URI-root}{vocabulary-name}#{token}
> 
> 	where the token should consist only of the letters a-z, A-Z, and the  
> numbers 0-9.  The URI root and vocabulary
> 	name should be set centrally and not in the definition of each  
> token.  For example, if a nominal concept is
> 
> 			http://www.ivoa.net/Thesauri/Food#Apple
> 
> 	(root="http://www.ivoa.net/Thesauri/", name="Food", token="Apple"),  
> then the SKOS definition begins with
> 
> 			<skos:Concept rdf:about="#Apple">

Note that this approach requires 'slash' namespace URI for the vocabulary/thesaurus.

> 
> 3. One is encouraged to use human-readable forms for the tokens with  
> some obivous connection to
> 	the preferred labels, e.g. conversion from the label via dropping  
> characters not included in the
> 	above list and sub-token separation via capitalization (e.g. "My  
> favorite idea-label #42" ->
> 	"MyFavoriteIdeaLabel42")

This seems a reversal of prior 'policy'/groupthink (?). 

> 
> 4. Vocabulary entries should be singular unless based on previously  
> determined sources where the
> 	conversion to singular forms would impare the usefulness of the  
> vocabulary.

On the surface, this sounds great, however, "unless...conversion to singular
would impare the usefulness" can actually be anything.

How about an 'acid test' to see how we interpret this rule? Assuming we
extracted terms from a pre-existing source, Would the following
terms be singular, or left plural?

"stars"
"mass ratios"
"telescopes"
"astronomers"
"masses"
"vocabularies"

> 
> 5. Thesaurus entries (BT/NT/RT) are encouraged but not required.
> 

The 'related terms' are next to useless IMO. Without some significant 
semantic typing, we can't know what the actual relationship is between
concepts. I'd argue we want to NOT encourage RT at all, its hard to think
of a reasonable use-case/user-story which involves these. In fact, I'd vote
to drop the use of RT from the vocabulary.

> [snip]
> 9. The maintainers of a vocabulary should provide on-line  
> documentation permitting the easy perusal of labels
> 	and any thesaurus and usage information.  The IVOA will try to  
> maintain a list of links to known vocabularies
> 	and may choose to provide it's own consistent on-line documentation  
> based on the SKOS files alone.

Its not explicit, but any vocabulary with IVOA namespace should have at least
one copy of its online documentation hosted from the IVOA site!

> 
> 10. The maintainers of a vocabulary should attempt to cross-reference  
> their vocabulary with one or more IVOA
> 	supported vocabularies, e.g. UCD1 and/or IVOAT.
> 
> Anything else?  Having just Ten Commandments would be nice.

This mostly sounds like 'best practices' rather than 'commandments'.

> 
> 
> >>> The grammatical number of the concept names (singular or plural)
> >> Singular, please! - it's a real pain to use the formal system of
> >> singular concepts and plural countables and I agree that singular
> >> should make the vocabulary simple to use
> > I think this is also a non-issue.  If a term is plural in the
> > vocabulary we're adapting (IAU93 and A&A use this convention) then it
> > should remain plural in the SKOS version, otherwise we're making
> > gratuitious changes; if it's singular in the original vocabulary
> > (AOIM) then it should remain singular in SKOS, for the same reason.
> >
> > [AG] The issue raises its head when it comes to the IVOAT. However,
> > since this is based on the IAU93 thesaurus we could, as I believe  
> > is the
> > case, just adopt the IAU93 practice.
> 
> No, in fact I want to remove the plural terms from IVOAT as soon as  
> possible (I finally got to this point in my list of things to do).    
> Any complaints?

None here :) To make the job easier, you may want to look at a singular 
once which Ed created from a version of yours last week or so. I can 
forward it to you if you would like. 

Once you do this, I have a script to reference definitions in the WordNet
dictionary which we may use to import a number of text definitions into 
the vocabulary.

> 
> External vocabularies like IAU93, AOIM and A&A are pre-defined and so  
> are what they are.  With IVOAT, we can choose to have what we want.
> 
> >>> I wouldn't want to bet which of the vocabularies will end up the  
> >>> most
> > useful in the end...
> 
> Well, the whole purpose of IVOAT is to create something useful.  If  
> we're already failing, please tell me so I can stop now...... :-(
> 

Well, we have an immediate use for it in our ontology work, so there is
at least a population of 'one group'. 

=brian





More information about the semantics mailing list