Format of tokens (was Re: Fwd: Re: IVOA Thesaurus)

Douglas Burke dburke at cfa.harvard.edu
Thu Nov 1 13:49:22 PDT 2007


Brian Thomas wrote:
> On Thursday 01 November 2007 1:06:55 pm Frederic V. Hessman wrote:
>> At the time, there where lots of voices saying that, while you are  
>> perfectly correct (and I'd prefer to have them as humanly readable as  
>> possible), the realities of computer-based parsing mean that a  
>> trivial token format costs less pain.
>>
>> How about an official show of hands?
> 
> Could we have the arguments against human readable again first, before voting?


Brian,

Norman wrote the following in an email on Oct 10 - Versions and 
namespaces (was: Vocab AND Ontology?) - where >> indicates a quite from 
Rick.

HTH,
Doug


 >>   I personally find the revamped token list to be much more 
palatable (which is obviously why I did it), being nearly human-usable 
(I don't like to be shouted at by capitalized tokens) and with implicit 
additional info (e.g. formal names of people and objects).

Doug brought up the issue of how to generate the concept names, as URI 
fragments.  This is a stylistic point, but I think an important one.

I'd like to suggest a rather drastic canonicalisation, so that "He+ 
ionization zone" would turn into #heionizationzone.  This is a pragmatic 
middle ground between having the concept name mirror the label, and 
having it fully opaque (such as #concept12345).

Having it consist of only lowercase alpha means (a) we're guaranteed to 
avoid any parsing troubles, with RDF parsers or with anything else; (b) 
it's clear to anyone looking at this that they're not supposed to be 
displaying the concept name, but using the concept's 'Label' and 
declared relationships instead; while (c) it retains some mnemonic value.

There is a case which can be made for having fully opaque concept names 
(this is what's done in the Gene Ontology, for example): it's point (b) 
above, plus it removes any temptation to argue about relationships based 
on the name alone.  Despite that, I think there's value in making it at 
least partly human-recognisable.




More information about the semantics mailing list