Singular vs Plural (Was: Re: Vocabularies: next steps

Norman Gray norman at astro.gla.ac.uk
Wed Nov 21 09:32:12 PST 2007


Brian, hello.

On 2007 Nov 21, at 15:09, Brian Thomas wrote:

> On Wednesday 21 November 2007, Norman Gray wrote:
>> 2. The grammatical number of the concept names (singular or plural)
>>
>> It seems that english-language thesauri `traditionally' have concepts
>> labelled with plurals, whereas French and German ones typically have
>> concepts labelled with singular terms.
>
> With all due respect, which thesaurus are you looking at? The one  
> on my
> shelf, "The Random House Thesaurus", published 1984, has only singular
> terms for nouns (concepts). I did a quick 8 or 9 page random  
> survey, then I started
> looking for astronomical terms...all of the following appear singular:

Ah yes, I was quoting, there, and referring to thesaurus as an  
information retrieval thing, rather than thesaurus as a book of  
synonyms.  This is (British Standard) BS 8723-2 (which overlaps with  
ISO 2788 and ISO 5964), section 6.4.1:

> Different traditions exist in different languages concerning the  
> use of singulars or plurals. Indexers in some language communities,  
> for example French and German, tend to prefer the singular form so  
> that the user can approach the thesaurus or index in the same way  
> as a dictionary. In English-speaking countries, however, it is  
> usual to base the choice on whether a particular term is a count  
> noun or a non-count noun. The latter convention helps to  
> distinguish between a process such as painting, which can only be  
> expressed in the singular, and the product of the same process, in  
> this case paintings.

My argument here is not `there's a standard so we should follow  
it' (though I'm always a sucker for that sort of argument), but that  
there is a set of best practices (amongst them this singular/plural  
thing) well established by the folk who index things for a living,  
and I'm quoting these standards not as Standards, but as fairly  
precise expressions of these practices.  In other words: where we  
have a choice, it seems sensible to follow these standards, if only  
on a principle of least surprise.

In other other words: I don't claim to be advancing an overwhelmingly  
strong positive argument here, but instead disagreeing with your  
counterargument.



Interestingly, that same standard defines 'thesaurus' as a:

> controlled vocabulary in which concepts are represented by  
> preferred terms, formally organized so that paradigmatic  
> relationships between the concepts are made explicit, and the  
> preferred terms are accompanied by lead-in entries for synonyms or  
> quasi-synonyms
> NOTE  The purpose of a thesaurus is to guide both the indexer and  
> the searcher to select the same preferred term or combination of  
> preferred terms to represent a given subject.

(ISO-5964 Sect. 3.16 has a briefer, but compatible, definition)

This is interesting because, in its introduction, that document says:

> Whereas in the past thesauri were designed for information  
> professionals trained in
> indexing and searching, today there is a demand for vocabularies  
> that untrained users will find to be
> intuitive. There is also a need for search aids in contexts where  
> “full text” is not available, such as museum
> collections and image databases. As the Internet and other networks  
> allow simultaneous searching across
> resource collections that have been indexed using different  
> vocabularies, there is a need to have the means
> of “translating” search queries across boundaries.

That is, here and implicitly throughout these various documents,  
there's the focus on thesauri as being for _search_, and for human- 
machine interactions, and that matches the actual uses of the A&A  
vocabulary (where also, all the concrete nouns are plural) and the  
AOIM one (singular), and the intended use of the IAU vocabulary  
(plural).

What thesaurus terms are _not_ about is machine understanding, and  
their semantics doesn't really help with that, and this is why the  
notion of broader/narrower has an operational definition ('all items  
returned by a query on a term will also be returned by a query on a  
related broader term') rather than a logical subclass relation.

> how do you label a single instance of a concept (for example,
> for later use in ontologies, creation of individuals becomes  
> difficult)?

I think the answer is that you don't.  Thesaurus/SKOS concepts are  
individuals, not classes, so that the vocabulary term 'stars' refers  
to the 'concept of stars' rather than the class of stars.  Thus

> Lets try another.."find all concepts which are stars which have  
> coordinates" :
>
> PREFIX ivo: <http://ivoa.net/vocab>
> describe $s where { $s a ivo:star . $s ivo:ra $ra .  $s ivo:dec  
> $dec . }

...should return nothing at all, because the 'concept of stars'  
doesn't have an RA and Dec (only stars have those, and the 'concept  
of stars' is not itself a star).  Thus the sort of query you can  
imagine is

prefix rm: <...registry_metadata#>
prefix vocab: <...vocabulary#>
select $resource
where {
   { $resource rm:keyword vocab:stars }
   UNION
   { $resource rm:keyword ?kw .
     ?kw skos:broader vocab:stars .
   } }

(in the context of some rule which has skos:broader being transitive).

This is, I realise, returning to the older discussion of vocabulary  
vs. ontology.

You also said:

> I feel that by  defining terms in the plural, we would be crippling  
> any machine
> use of the document (which is my understanding of its primary  
> reason for being)

But that's what an ontology's for, not a thesaurus.  It's clear that  
an ontology for a subject area would allow different and valuable  
functionality, and it's clear that there would be close links between  
an astronomy ontology and, for example, the IAU thesaurus (in  
whichever version...), but it is also clear that a thesaurus is  
addressing a distinct, human-centred, and relatively simple problem,  
as distinct from the machine-centred problem that would need a full  
ontology.



>
>> That's according to  ISO-5964.
>
> The copy I found is "Guidelines for the establishment and  
> development of multilingual thesauri"
> which I checked out at : http://www.collectionscanada.ca/iso/ 
> tc46sc9/standard/5964e.htm#3.
> I tried looking at this for insight, but a quick read didn't reveal  
> any information germane to
> the plural vs singular concept definition issue. Can you give a  
> better pointer?

ISO-5964 Sect. 11.1.3 talks about this (but appears not to be  
included in that excerpt; bah! I wish ISO didn't seek to profit from  
their blasted standards, and sell them for what appears to be £2/$4  
per page!).



[There's a quite separate potential problem here in that UCDs are not  
really concepts, but types.  I wonder if this will get us into  
trouble in the future....]

All the best,

Norman


-- 
------------------------------------------------------------
Norman Gray  :  http://nxg.me.uk
eurovotech.org  :  University of Leicester, UK





More information about the semantics mailing list