IVOAT update, new and improved docs, suggestion for compromise on token format

Mon Nov 5 13:45:49 PST 2007

The updated documentation on the present IVOAT version now works  
correctly and the dictionary is split up into separate files for each  
beginning letter to make things faster.

	http://www.astro.physik.uni-goettingen.de/~hessman/rdf/IVOAT/

The thesaurus has been clean of all U's and UF's, there are now BT's  
for every NT and vice versa, and the number of TopConcepts keeps  
dropping (currently 1203) as I find the time (mostly hundreds of  
trivial missing BT's in the original IAU thesaurus).

> As I said, we can't use the preLabels because they have spaces and  
> the IDs are not easily readable, so you give us nothing we can use  
> directly.   Nearly everyone uses either camelBack or underscores.   
> But maybe this is moot anyway since all ontologies are done in the  
> singular.

We've nominally hashed out these issues before without coming to a  
conclusion.  I'm happy with keeping the format in the present text  
file, e.g. "Tully-Fisher_relation" instead of "tullyfisherrelation",  
but even the latter is readable and "TullyFisherRelation" is also OK  
(also easily derived from the present tokens).  Nobody's talking  
about using the prefLabels for onotology tokens, or am I getting this  
wrong?  If people are going to edit their OWL files the same way I'm  
editing the vocabulary text file (ahem...... with... a.... vi), then  
I can understand that a simple solution would be nice.   I thought  
all those neat ontology tools are light years ahead of the leading  
edge apps of the 70's......

> So what is the vote count anyway?

Well, only a small fraction of the nominal voters has voted.  If  
those who haven't voted don't speak up soon they can't complain - or  
simplyl don't care.

How about if I help by explicitly defining the standard options I see  
and listing the positive and negative features:

	Option 1: keep present tokens, e.g.			"Tully-Fisher_relation"		(no  
change from text file)
		+ no change needed
		+ human readable
		+ can use capitalization to preserve standard meaning (e.g.the  
element  "mercury" vs the planet "Mercury")
		- uses lower and upper case, underscores and dashes, so may cause  
parsers to crash and users to make mistakes

	Option 2: bare-bones lowercase, e.g.	"tullyfisherrelation"		(upper- 
 >lower, delete hyphens and underscores)
		+ trivial parsing of lower case only
		+ stresses that tokens are only tokens and that humans should be  
using the labels
		- barely human readable; humans are going to use the tokens anyway

	Option 3: "camelBack"-syntax, e.g.		"TullyFisherRelation"	(present  
hyphens and underscores deleted but they define when sub-word  
capitalization occurs)
		+ simple to parse
		+ human readable
		- parsers have to deal with upper and lower cases only.

Looking at it this way and because Options 1 and 2 seem to create  
complaints, maybe the best compromise is Option 3, e.g. all tokens  
start with a capital letter and all sub-token words as well, so that  
they all look pretty well the same.   To give you all a taste, I've  
implemented Option 3 in the current IVOAT (take a look at the list of  
tokens).

Any strong complaints?

Rick