Format of tokens
Frederic V. Hessman
Hessman at Astro.physik.Uni-Goettingen.DE
Wed Nov 14 03:24:21 PST 2007
On 14 Nov 2007, at 11:24 am, Alasdair Gray wrote:
>> Number of TopConcepts: 1325
> I do not agree with this figure (see next comment).
>
>> Thus, you can't assume that the BT's and NT's are all present in
>> the original (trex.txt). Alasdair's figure of 512 top concepts
>> assumed that the IAU thesaurus was reasonably complete and self-
>> consistent.
> I cannot claim to have looked closely at the BT/NT relationships in
> the original (trex.txt) file. However, the IAU thesaurus also
> issues a hierarchy file (hierlist.txt). This file gives the
> hierarchy of the original thesaurus and it is this that has 516 top
> level concepts. Rick has assumed that a top level concept is one
> that does not have a broader term. For the IVOAT I would agree with
> this as it should result in a less confusing hierarchy that matches
> users expectations. However, for the IAU93 this is wrong as it
> results in a different number of top level concepts (although I
> would have thought that it would have been less then 516 since some
> of these terms appear as narrower terms of other concepts) and thus
> a different hierarchy from the original version of the thesaurus.
Aha! I'm sure I'll leave this fine point to the experts, but I would
have thought that a "TopConcept" is one which is at the top of a
connection-hierarchy (after being chastened, I won't say
"ontological"). If there is a concept "gummi bears" but no "BT
candy" then the authors of the vocabulary have obviously left "candy"
out for some reason, making "gummi bears" pretty top-level to me.
Or is my naivite showing? I assumed that hierlist.txt was simply
their best attempt back when all of this was much more painful (yes,
this project has now forced me to learn lots of python, as I
intended, but at least I'm not doing this on paper or with an Intel
286 under DOS).
Dropping the TopConcept links to entries with no NT's is trivial - is
this the general consensus?
Rick
More information about the semantics
mailing list