Voting on gormat of tokens (was Re: IVOA Thesaurus)
Douglas Burke
dburke at cfa.harvard.edu
Thu Nov 1 12:26:34 PDT 2007
I vote for there being some form of
normalization/canonicalization/some-other-ization of the Human-readable
terms. The important ones for me are all lower case (as I've found too
many errors in my own work from case mismatches [*]) and the removal of
problematic characters (or combinations of characters). I don't have a
real opinion on whether spaces should be removed or replaced by "_".
[* Are there issues with this particular choice in Ed's ontology
use-case below?]
Doug
Frederic V. Hessman wrote:
> At the time, there where lots of voices saying that, while you are
> perfectly correct (and I'd prefer to have them as humanly readable as
> possible), the realities of computer-based parsing mean that a trivial
> token format costs less pain.
>
> How about an official show of hands?
>
> Rick
>
> On 1 Nov 2007, at 5:32 pm, Ed Shaya wrote:
>
>> Rick,
>>
>> Well, I vote to put back the underscores and the capitalization
>> where appropriate. There is no need to go out of one's way and make
>> all IDs cryptic just to make a point about the concept of tokens. In
>> ontology these become the element names of instances and it is really
>> handy to be able to readily discern what kind of instance it is by
>> looking, rather than going to some lookup table. We need some
>> prescience here, not to be confused with pre_science.
>>
>> Ed
>>
>> Frederic V. Hessman wrote:
>>>
>>> On 31 Oct 2007, at 6:54 pm, Ed Shaya wrote:
>>>
>>>> What happened to the underscores between all of the compound words?
>>>> Ed
>>>
>>> A while back, we communally decided that the tokens should be as
>>> compact and simple as possible, i.e. no caps, no diacritical marking,
>>> no spaces, no underscores, not only to make them syntactically simple
>>> but to emphasize that they are only tokens. The text file still has
>>> the underscores, but now only for historical reasons (i.e. the
>>> original SV proposal).
>>>
>>> If everyone would rather see the underscores back again, no problem.
>>>
>>> Rick
>>>
>>
>
>
More information about the semantics
mailing list