FW: Format of tokens

Wed Nov 14 02:54:53 PST 2007

Sorry, forgot to include everyone in the reply.

Alasdair

Alasdair J G Gray <http://www.dcs.gla.ac.uk/~agray/> 

Research Associate: Explicator Project

http://explicator.dcs.gla.ac.uk

Computer Science, University of Glasgow

0141 330 6292

From: Alasdair Gray 
Sent: 14 November 2007 10:24
To: 'Frederic V. Hessman'
Subject: RE: Format of tokens

Hi Rick, All,

Comments in line below preceded by [AG]

From: owner-semantics at eso.org [mailto:owner-semantics at eso.org] On Behalf
Of Frederic V. Hessman
Sent: 13 November 2007 17:09
To: IVOA semantics
Subject: Re: Format of tokens

	My concern is that there is a discrepancy between Rick's SKOS
model generated by his script and the original files. My feeling is that
the SKOS model representing the IAU Thesaurus that is to be published by
the IVOA should be an accurate model. If we cannot produce an accurate
SKOS model but claim that it is, then people will not trust the IVOAT or
any of the semantics works involving vocabularies and ontologies.

The problem was simply that I had forgotten to delete the entries which
turned into aliases.  The real raw statistics are

            Number of initial entries:  2950

[AG] Glad to see that we agree on this figure.

            Number of explicit narrower entries (with BTs):  1226

            Number of explicit broader  entries (with NTs):  512

            Number of entries with references   (with RTs):  2134

            Final number of SKOS Concepts:  2551

[AG] We are within 1 here which could be an error on my behalf.

            Number of TopConcepts:  1325

[AG] I do not agree with this figure (see next comment).

Thus, you can't assume that the BT's and NT's are all present in the
original (trex.txt).  Alasdair's figure of 512 top concepts assumed that
the IAU thesaurus was reasonably complete and self-consistent.

[AG] I cannot claim to have looked closely at the BT/NT relationships in
the original (trex.txt) file. However, the IAU thesaurus also issues a
hierarchy file (hierlist.txt). This file gives the hierarchy of the
original thesaurus and it is this that has 516 top level concepts. Rick
has assumed that a top level concept is one that does not have a broader
term. For the IVOAT I would agree with this as it should result in a
less confusing hierarchy that matches users expectations. However, for
the IAU93 this is wrong as it results in a different number of top level
concepts (although I would have thought that it would have been less
then 516 since some of these terms appear as narrower terms of other
concepts) and thus a different hierarchy from the original version of
the thesaurus.

*         The declared top level concepts should accurately match those
of the original IAU Thesaurus. (At the moment Rick's script does not
generate anything close to the proper model here.)

Well, better than you thought and better now that I've found the
(latest) bug.

[AG] I'm afraid I have to disagree with you here.

*         The relationships within each concept need to point to other
concepts. (Although Rick has sorted this out, the version on the web is
still wrong.)

[AG] Now correct on the web version too J

*         The 398 terms which declare Use relationships should only
appear as skos:altLabel. For example "ab variable stars" should not
appear as a concept but as an alternative label for "Bailey Types" and
"RR Lyrae Stars".

This problem is solved (it was the bug).

*         Agreement on the format of labels. At the moment Rick has left
them as they appear in thesaurus files but I feel that it would be more
user friendly to use lower case with the first word capitalised.

Frankly, the original document uses (practically) all capitals and we
want to convert the original thesaurus using as few changes as necessary
(the only point of doing it), so why not keep the original labels?  If
people hate to be shouted at and think that the IAU93 isn't very
user-friendly, all the better.  Any other format will have problems:
e.g. you don't really want to turn "BAADE WESSELINK METHOD" into "Baade
wesselink method" - you want people to use the IVOAT and see
"Baade-Wesselink method". 

	*         Agreement on the format of identifiers. The options
that have been considered are:

	1.       Generating a new unique identifier, e.g. some number

	2.       Using camel back notation based on the preferred label,
so "Bailey Types" would have the identifier "BaileyTypes"

	3.       Using a lower case only version of the preferred label,
so "Bailey Types" would have the identifier "baileytypes"

	Please see the appropriate thread in the semantics list for a
full discussion of this issue. 

... from which you'll see that there are few people who really care.  I
still haven't seen any recent complaints about compromise notation # 2
but previous stronger complaints about #1 and #3.  Barring complaints
can we simply adopt #2?  There is not perfect solution (e.g. "Ba II
stars" -> "BaIiStars", which looks like something else).

[AG] I am happy with option 2 also.

Once we have agreement on these issues, then the results can be applied
to the IVOAT.

... and the rest of the thesauri we're going to generate in this
exercise.

	Cheers (I think I'm going to go for a long drink to recover from
this),

Now you all know how many beers you all owe me.

[AG] Absolutely. I am extremely appreciative of all the work that you
have put into generating these models.

Alasdair

Rick

Alasdair J G Gray <http://www.dcs.gla.ac.uk/~agray/> 

Research Associate: Explicator Project

http://explicator.dcs.gla.ac.uk

Computer Science, University of Glasgow

0141 330 6292

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/semantics/attachments/20071114/d0e393c9/attachment-0001.html>