New issue?: vocabulary maintenance
Brian Thomas
thomas at astro.umd.edu
Wed Feb 6 06:45:49 PST 2008
On Wednesday 06 February 2008 8:06:39 am Frederic V. Hessman wrote:
> Strictly speaking, 99% of the "definitions" in the present IVOAT are
> simply human-readable versions of the token names. The few exceptions
> are mostly new entries like the atomic elements (e.g. "actinium" has
> the description "actinium (atomic number 89)") or a few where the
> meaning needed to be more precise or isn't commonly known (e.g.
> BaileyType). Thus, one could create minimal descriptions also for
> the IAU-93 thesaurus by simply de-capitalizing. I did this for the
> IVOAT, but I admit it took a bit more than just running a python
> script over it to get it right.
I think we want more than this. Ideally, we want to have something like
a few sentences to a paragraph. Using WordNet, or the dbpedia/Wikipedia
can speed us towards 'acceptable' definitions for most terms/tokens. On
a population of 1000 terms, I was able to use the WordNet to garner 800
or so definitions. From those, it had an overall accuracy (this is from memory)
of about 75% (in otherwords, about 75% of the time, the definition looked
fine with no editing).
I image that we can create definitions which are "generally" accurate and
acceptable. For the really controversial terms (and how many of these can
there possibly be??) we can provide pointers to 'seminal' papers =or= better
yet, just drop any definition at all and save the argument for a rainy day.
=brian
More information about the semantics
mailing list