String character range

Doug Tody dtody at nrao.edu
Tue Aug 19 08:52:05 PDT 2008


Hi Mark -

I also don't think this is a very important issue, but if others do,
this would be a reasonable way to provide it without compromising
legacy code which does not support UTF-8.  I would note though that
most text formats and text-oriented software I have seen in recent
years specify UTF-8 rather than ASCII, so clearly it is a widely used
standard.  In actual implementations which use standard libraries it
is probably going to be supported anyway, so while UTF-8 might not
be required it may be wise to permit it as a feature.

	- Doug


On Tue, 19 Aug 2008, Mark Taylor wrote:

> Doug,
> 
> On Mon, 4 Aug 2008, Doug Tody wrote:
> 
> > Sure, I agree that the range of allowable chars should be restricted
> > as you suggest.   My suggestion is to specify UTF-8, restricted as
> > has been discussed for 7-bit chars, but allowing UTF-8 encoded chars
> > to pass through.  That would seem to do it and we still have simple
> > ASCII virtually all of the time so I don't think this will break
> > legacy code.  If at some point full up unicode is needed (eg 16 bit
> > chars), that should be a different data type.
> 
> I am slightly against this, since it reduces the simplicity of what's
> going on.  In practice, as you say, I think the amount of problematic
> behaviour that defining SAMP string content as UTF-8 would cause would be very
> small.  But I've had to go to the Unicode web site and read the UTF-8 FAQs to
> convince myself that this is the case.  Sloppy programmers who don't carefully
> read the spec and treat the byte stream as if it's ASCII will be fine >99% of
> the time.  But some burden will be imposed on careful programmers who want to
> make sure that the UTF-8 is treated properly, especially if they are working
> on platforms which are not Unicode-aware.  If non-Latin character transmission
> is in the category "essential" or even "nice to have" I'd say this is a price
> worth paying.  If it's just "because we can" I'd say it's not.  Responses so
> far to my question:
> 
> On Mon, 4 Aug 2008, Mark Taylor wrote:
> 
> > Which of these is best depends on how important the requirement to be
> > able to send Unicode and control characters is. My vote is not very.
> > Can we have a show of hands?
> 
> suggest to me that this is in the "because we can" category.  But if people
> believe that non-Latin character transmission is something
> that we really ought to have in SAMP strings, then I'd go along with
> this suggestion.
> 
> Mark
> 
> -- 
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
> 



More information about the apps-samp mailing list