Format of tokens

Alasdair Gray agray at dcs.gla.ac.uk
Mon Nov 12 07:40:51 PST 2007


Hi,

 

I would like to follow up on my mail earlier today. I have done more
detailed analysis of the IAU93 Thesaurus, this time going back to the
original source files and going through the terms one by one. (Yes, I'm
a bit cross eyed now.) This has resulted in the following

 

Number of

IAU original files

IAU93 Rick's SKOS Model

Terms1

2950

(No equivalent in SKOS)

Top level concepts

516

1720

Concepts

25522

2947

Alternative labels

398

8583

 

1 This includes terms that become concepts and those which become
alternative labels

2This total does not include those terms which declare a Use
relationship. This is because these terms should appear in the SKOS
model as skos:altLabel. 

3This total probably includes declared synonyms and it probably is less
important for them to be the same.

 

(Note, I have not been able to do a full analysis of the relationships
due to the format that the IAU is available in and the limits of time.)

 

My concern is that there is a discrepancy between Rick's SKOS model
generated by his script and the original files. My feeling is that the
SKOS model representing the IAU Thesaurus that is to be published by the
IVOA should be an accurate model. If we cannot produce an accurate SKOS
model but claim that it is, then people will not trust the IVOAT or any
of the semantics works involving vocabularies and ontologies.

 

Specific issues that need to be addressed in the SKOS model:

*         What should be the base URI for the thesauri? Can we formalise
this work within the semantics group and give the thesauri a home within
the IVOA domain?

*         Looking at the namespace imports, rdfs, owl and iau93 are not
used within the document.

*         The declared top level concepts should accurately match those
of the original IAU Thesaurus. (At the moment Rick's script does not
generate anything close to the proper model here.)

*         The relationships within each concept need to point to other
concepts. (Although Rick has sorted this out, the version on the web is
still wrong.)

*         The 398 terms which declare Use relationships should only
appear as skos:altLabel. For example "ab variable stars" should not
appear as a concept but as an alternative label for "Bailey Types" and
"RR Lyrae Stars".

*         Agreement on the format of labels. At the moment Rick has left
them as they appear in thesaurus files but I feel that it would be more
user friendly to use lower case with the first word capitalised.

*         Agreement on the format of identifiers. The options that have
been considered are:

1.       Generating a new unique identifier, e.g. some number

2.       Using camel back notation based on the preferred label, so
"Bailey Types" would have the identifier "BaileyTypes"

3.       Using a lower case only version of the preferred label, so
"Bailey Types" would have the identifier "baileytypes"

Please see the appropriate thread in the semantics list for a full
discussion of this issue.

 

Once we have agreement on these issues, then the results can be applied
to the IVOAT.

 

Another issue that my analysis of the IAU Thesaurus has shown up today
is that there is minor discrepancies between the text files distributed
from http://www.aao.gov.au/lib/thesaurus.html and the web version
available from http://msowww.anu.edu.au/library/thesaurus/english/. We
should probably which we are using as the definitive version for our
work. (This only affects at most half a dozen entries.)

 

I'll make the lists of

*         Terms

*         Top level terms

*         Concepts

*         Alternative labels

available, once I've had a chance to put them into an appropriate
format.

 

Cheers (I think I'm going to go for a long drink to recover from this),

 

Alasdair

 

 

Alasdair J G Gray <http://www.dcs.gla.ac.uk/~agray/> 

Research Associate: Explicator Project

http://explicator.dcs.gla.ac.uk

Computer Science, University of Glasgow

0141 330 6292

 

From: owner-semantics at eso.org [mailto:owner-semantics at eso.org] On Behalf
Of Frederic V. Hessman
Sent: 12 November 2007 14:37
To: IVOA semantics; IVOA VOEvent List
Subject: Re: Format of tokens

 

 

On 12 Nov 2007, at 11:18 am, Alasdair Gray wrote:

	I think there might be a slight problem with your script that
generates the vocabularies. If you look at any relationship, you will
find that it points to itself rather than another concept.

Whoops!  Fortunately, an easy fix.  Everything should be ok now (only
affected the RDF files).

I have done a quick analysis of the IAU93 and IVOAT vocabularies. 

Number of

IAU93

IVOAT

Top level concepts

1720

1203

Concepts

2947

2892

Broader relationships

1716

2307

Narrower relationships

1716

2307

Associative relationships

7647

8040

 

The major difference between IAU93 and IVOAT (other than the deletion of
some errors and inclusion of new concepts) is that 1) many top level
concepts in IAU93 were removed by moving things to aliases or noting the
obivous BT's and NT's and 2) the number of BT's perfectly matches the
number of NT's.

 

The number of top level concepts in IVOAT could easily be halved or more
if we went to a modest bit of editing effort.

 

Rick

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/semantics/attachments/20071112/ada55bfc/attachment-0003.html>


More information about the semantics mailing list