IVOAT update, new and improved docs, suggestion for compromise on token format
Frederic V. Hessman
Hessman at Astro.physik.Uni-Goettingen.DE
Mon Nov 5 13:45:49 PST 2007
The updated documentation on the present IVOAT version now works
correctly and the dictionary is split up into separate files for each
beginning letter to make things faster.
http://www.astro.physik.uni-goettingen.de/~hessman/rdf/IVOAT/
The thesaurus has been clean of all U's and UF's, there are now BT's
for every NT and vice versa, and the number of TopConcepts keeps
dropping (currently 1203) as I find the time (mostly hundreds of
trivial missing BT's in the original IAU thesaurus).
> As I said, we can't use the preLabels because they have spaces and
> the IDs are not easily readable, so you give us nothing we can use
> directly. Nearly everyone uses either camelBack or underscores.
> But maybe this is moot anyway since all ontologies are done in the
> singular.
We've nominally hashed out these issues before without coming to a
conclusion. I'm happy with keeping the format in the present text
file, e.g. "Tully-Fisher_relation" instead of "tullyfisherrelation",
but even the latter is readable and "TullyFisherRelation" is also OK
(also easily derived from the present tokens). Nobody's talking
about using the prefLabels for onotology tokens, or am I getting this
wrong? If people are going to edit their OWL files the same way I'm
editing the vocabulary text file (ahem...... with... a.... vi), then
I can understand that a simple solution would be nice. I thought
all those neat ontology tools are light years ahead of the leading
edge apps of the 70's......
> So what is the vote count anyway?
Well, only a small fraction of the nominal voters has voted. If
those who haven't voted don't speak up soon they can't complain - or
simplyl don't care.
How about if I help by explicitly defining the standard options I see
and listing the positive and negative features:
Option 1: keep present tokens, e.g. "Tully-Fisher_relation" (no
change from text file)
+ no change needed
+ human readable
+ can use capitalization to preserve standard meaning (e.g.the
element "mercury" vs the planet "Mercury")
- uses lower and upper case, underscores and dashes, so may cause
parsers to crash and users to make mistakes
Option 2: bare-bones lowercase, e.g. "tullyfisherrelation" (upper-
>lower, delete hyphens and underscores)
+ trivial parsing of lower case only
+ stresses that tokens are only tokens and that humans should be
using the labels
- barely human readable; humans are going to use the tokens anyway
Option 3: "camelBack"-syntax, e.g. "TullyFisherRelation" (present
hyphens and underscores deleted but they define when sub-word
capitalization occurs)
+ simple to parse
+ human readable
- parsers have to deal with upper and lower cases only.
Looking at it this way and because Options 1 and 2 seem to create
complaints, maybe the best compromise is Option 3, e.g. all tokens
start with a capital letter and all sub-token words as well, so that
they all look pretty well the same. To give you all a taste, I've
implemented Option 3 in the current IVOAT (take a look at the list of
tokens).
Any strong complaints?
Rick
More information about the semantics
mailing list