String character range

Fri Aug 1 09:25:36 PDT 2008

Hi.

I find that your suggestion below is a good compromise. I would split it 
in two points:

1. At SAMP protocol definition level we might define that "string" can 
accept any sequence of 0X01-0x7f characters adding the escape convention 
for any printable Unicode char out of the specified range (so it is 
general).

2. At Standard Profile level I would put more constraints, limiting the 
charset to the XML range and introducing the escape convention for the 
other unsupported chars.

Is it reasonable?

Luigi

> As far as SAMP goes: that character looks to me like code point 0xf1, 
> from the Latin-1 Supplement code block.  So you could not send it using 
> either the existing definition for a SAMP string or the proposal (4) 
> that I am suggesting.  If we used a variant of my suggestion (3):
> 
>   3. Define some escaping convention for un-XML characters, e.g. \u001f
>      for character 31.
> 
> with the intention that this escaping mechanism could be used for
> any 8-bit character it would be possible to transmit this kind of 
> non-7-bit Latin character.  However, characters with the 8th bit set 
> might cause problems for certain other transports and language 
> environments.  I must admit apart from RFC-822 mail-type contexts I 
> can't think of what these might be, but I'd be inclined to steer clear 
> of non-7-bit characters just in case.  However, if others (e.g. with 
> less Anglo-Saxon prejudices) think that it's an important requirement to 
> permit transmission of characters like this within
> SAMP we could take that on board.  We could even in principle say that 
> this escaping mechanism could be used to specify any Unicode character - 
> but I think that would definitely be a bad idea as it would effectively 
> restrict use of the protocol to languages with Unicode support, which 
> excludes quite a lot.
> 
> Mark