String character range

Luigi Paioro luigi at lambrate.inaf.it
Thu Aug 28 08:16:02 PDT 2008


Dear Mark, Dough and all,

   in reply to this mail

> This is a coherent suggestion and it could be done.  However in my 
> opinion it's not the best way to go.  While making the protocol as
> general and flexible as possible sounds like a good thing, the price
> that you pay is a reduction in interoperability.  If the protocol
> says that SAMP strings can only ever contain characters 0xA, 0xD and 
> 0x20-7F (or whatever) then you know that if you can handle those 
> characters then you can definitely interoperate with anyone else
> speaking the protocol.  If the protocol says that any UTF-8 character
> is permitted then someone trying to write middleware that does
> translation between the far future perverted Ice-based profile and the
> current Standard Profile will have a problem.  Is that kind of 
> middleware something we're going to need?  I don't know.  But in 
> weighing up how we ought to plan for unknown future evolutions,
> I would rather err on the side of safety than of flexibility.

I must admit that I didn't consider the possibility of having a 
multi-profile hub, hence the necessity of translations. For 
interoperability reasons, probably it is logical to assume that every 
SAMP hub implementation MUST support at least the Standard Profile, and 
then other possible profiles or extensions. Therefore the XML limits 
should be taken in account at abstract API level. Sure.

Anyway, as I said in a previous mail, I don't think that UTF-8 support 
is really important and likely in the 99% of the cases ASCII with the 
said limits is appropriate, so I don't insist.

However Dough's problem with that VOTable has suggested me a possible 
scenario that could require UTF-8 support (with the XML constraints), 
maybe introducing an additional data type. Well, this is the scenario: I 
get from a SSA (TAP, SIA, whatever) service a VOTable which contains 
UTF-8 chars (no matter what) and I get it using a VO enabled 
application; after some elaborations I broadcast it to one or more other 
applications in an asynchronous way using SAMP. This simple operation 
can be done in two ways:

i) by reference: the VOTable is written in a file (local or remote) and 
the reference to such a file is sent with a proper MType as a simple 
ASCII string (e.g. "file:///tmp/myvotab.vot", 
"ivo://my.vospace.address/myvotab.vot", etc.)

ii) by value: the content of the VOTable is sent as a byte stream, still 
using a proper MType. This byte stream can simply be a string UTF-8 encoded.

Case i) requires only ASCII charset supported, while ii) requires 
support for UTF-8 (or at least leave it to pass through) or an 
additional general data type for byte streams (which I suspect could be 
useful even for other purposes).

If ASCII charset only (with the said limits) is allowed in a SAMP 
message, then only case i) is allowed. If someone wished the possibility 
of passing a data by value (case ii) then I think the discussion would 
be still long...


Luigi



More information about the apps-samp mailing list