FW: Format of tokens
Alasdair Gray
agray at dcs.gla.ac.uk
Wed Nov 14 02:54:53 PST 2007
Sorry, forgot to include everyone in the reply.
Alasdair
Alasdair J G Gray <http://www.dcs.gla.ac.uk/~agray/>
Research Associate: Explicator Project
http://explicator.dcs.gla.ac.uk
Computer Science, University of Glasgow
0141 330 6292
From: Alasdair Gray
Sent: 14 November 2007 10:24
To: 'Frederic V. Hessman'
Subject: RE: Format of tokens
Hi Rick, All,
Comments in line below preceded by [AG]
From: owner-semantics at eso.org [mailto:owner-semantics at eso.org] On Behalf
Of Frederic V. Hessman
Sent: 13 November 2007 17:09
To: IVOA semantics
Subject: Re: Format of tokens
My concern is that there is a discrepancy between Rick's SKOS
model generated by his script and the original files. My feeling is that
the SKOS model representing the IAU Thesaurus that is to be published by
the IVOA should be an accurate model. If we cannot produce an accurate
SKOS model but claim that it is, then people will not trust the IVOAT or
any of the semantics works involving vocabularies and ontologies.
The problem was simply that I had forgotten to delete the entries which
turned into aliases. The real raw statistics are
Number of initial entries: 2950
[AG] Glad to see that we agree on this figure.
Number of explicit narrower entries (with BTs): 1226
Number of explicit broader entries (with NTs): 512
Number of entries with references (with RTs): 2134
Final number of SKOS Concepts: 2551
[AG] We are within 1 here which could be an error on my behalf.
Number of TopConcepts: 1325
[AG] I do not agree with this figure (see next comment).
Thus, you can't assume that the BT's and NT's are all present in the
original (trex.txt). Alasdair's figure of 512 top concepts assumed that
the IAU thesaurus was reasonably complete and self-consistent.
[AG] I cannot claim to have looked closely at the BT/NT relationships in
the original (trex.txt) file. However, the IAU thesaurus also issues a
hierarchy file (hierlist.txt). This file gives the hierarchy of the
original thesaurus and it is this that has 516 top level concepts. Rick
has assumed that a top level concept is one that does not have a broader
term. For the IVOAT I would agree with this as it should result in a
less confusing hierarchy that matches users expectations. However, for
the IAU93 this is wrong as it results in a different number of top level
concepts (although I would have thought that it would have been less
then 516 since some of these terms appear as narrower terms of other
concepts) and thus a different hierarchy from the original version of
the thesaurus.
* The declared top level concepts should accurately match those
of the original IAU Thesaurus. (At the moment Rick's script does not
generate anything close to the proper model here.)
Well, better than you thought and better now that I've found the
(latest) bug.
[AG] I'm afraid I have to disagree with you here.
* The relationships within each concept need to point to other
concepts. (Although Rick has sorted this out, the version on the web is
still wrong.)
[AG] Now correct on the web version too J
* The 398 terms which declare Use relationships should only
appear as skos:altLabel. For example "ab variable stars" should not
appear as a concept but as an alternative label for "Bailey Types" and
"RR Lyrae Stars".
This problem is solved (it was the bug).
* Agreement on the format of labels. At the moment Rick has left
them as they appear in thesaurus files but I feel that it would be more
user friendly to use lower case with the first word capitalised.
Frankly, the original document uses (practically) all capitals and we
want to convert the original thesaurus using as few changes as necessary
(the only point of doing it), so why not keep the original labels? If
people hate to be shouted at and think that the IAU93 isn't very
user-friendly, all the better. Any other format will have problems:
e.g. you don't really want to turn "BAADE WESSELINK METHOD" into "Baade
wesselink method" - you want people to use the IVOAT and see
"Baade-Wesselink method".
* Agreement on the format of identifiers. The options
that have been considered are:
1. Generating a new unique identifier, e.g. some number
2. Using camel back notation based on the preferred label,
so "Bailey Types" would have the identifier "BaileyTypes"
3. Using a lower case only version of the preferred label,
so "Bailey Types" would have the identifier "baileytypes"
Please see the appropriate thread in the semantics list for a
full discussion of this issue.
... from which you'll see that there are few people who really care. I
still haven't seen any recent complaints about compromise notation # 2
but previous stronger complaints about #1 and #3. Barring complaints
can we simply adopt #2? There is not perfect solution (e.g. "Ba II
stars" -> "BaIiStars", which looks like something else).
[AG] I am happy with option 2 also.
Once we have agreement on these issues, then the results can be applied
to the IVOAT.
... and the rest of the thesauri we're going to generate in this
exercise.
Cheers (I think I'm going to go for a long drink to recover from
this),
Now you all know how many beers you all owe me.
[AG] Absolutely. I am extremely appreciative of all the work that you
have put into generating these models.
Alasdair
Rick
Alasdair J G Gray <http://www.dcs.gla.ac.uk/~agray/>
Research Associate: Explicator Project
http://explicator.dcs.gla.ac.uk
Computer Science, University of Glasgow
0141 330 6292
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/semantics/attachments/20071114/d0e393c9/attachment-0001.html>
More information about the semantics
mailing list