Format of tokens

Frederic V. Hessman Hessman at Astro.physik.Uni-Goettingen.DE
Wed Nov 14 03:24:21 PST 2007


On 14 Nov 2007, at 11:24 am, Alasdair Gray wrote:
>>             Number of TopConcepts:  1325
>  I do not agree with this figure (see next comment).
>
>> Thus, you can't assume that the BT's and NT's are all present in  
>> the original (trex.txt).  Alasdair's figure of 512 top concepts  
>> assumed that the IAU thesaurus was reasonably complete and self- 
>> consistent.
> I cannot claim to have looked closely at the BT/NT relationships in  
> the original (trex.txt) file. However, the IAU thesaurus also  
> issues a hierarchy file (hierlist.txt). This file gives the  
> hierarchy of the original thesaurus and it is this that has 516 top  
> level concepts. Rick has assumed that a top level concept is one  
> that does not have a broader term. For the IVOAT I would agree with  
> this as it should result in a less confusing hierarchy that matches  
> users expectations. However, for the IAU93 this is wrong as it  
> results in a different number of top level concepts (although I  
> would have thought that it would have been less then 516 since some  
> of these terms appear as narrower terms of other concepts) and thus  
> a different hierarchy from the original version of the thesaurus.
Aha!  I'm sure I'll leave this fine point to the experts, but I would  
have thought that a "TopConcept" is one which is at the top of a  
connection-hierarchy (after being chastened, I won't say  
"ontological").   If there is a concept "gummi bears" but no "BT  
candy" then the authors of the vocabulary have obviously left "candy"  
out for some reason, making "gummi bears" pretty top-level to me.

Or is my naivite showing?   I assumed that hierlist.txt was simply  
their best attempt back when all of this was much more painful (yes,  
this project has now forced me to learn lots of python, as I  
intended, but at least I'm not doing this on paper or with an Intel  
286 under DOS).

Dropping the TopConcept links to entries with no NT's is trivial - is  
this the general consensus?

Rick



More information about the semantics mailing list