String character range
m.b.taylor at bristol.ac.uk
Tue Aug 19 04:30:58 PDT 2008
Luigi, Doug and others,
sorry I've let this one go cold, I got sidetracked by something else.
On Mon, 4 Aug 2008, Luigi Paioro wrote:
> I think that Unicode chars would be rarely sent, and control chars never at
> all. Probably in the 99% of the cases ASCII charset with the limitations you
> indicated is enough, so I don't have a strong position respect the Unicode
> Anyway I've thought to Dough's suggestion regarding UTF-8 and I've looked
> here and there for what string encoding mechanism adopt other RPC systems
> like ZeroC's Ice and DBus (I've also looked for CORBA encoding, but I didn't
> succeed). Well, DBus and Ice, either use UTF-8 (with no limitations). I've
> not looked at the other RPC systems (there are a plethora) but those are my
> favourite (along with XML-RPC and SOAP of course) and so I've looked there.
> Now, suppose that in the far far future, a perverted guy decides to implement
> SAMP using a different profile, for instance using Ice as wire protocol (in
> principle it should be possible) instead of XML-RPC. It would be a shame if
> such an implementation inherited the limitations coming from the XML limits.
> In my opinion the limits should be put at implementation and language level,
> not at protocol level... it should be as general (and flexible) as possible.
> So, why not follow Dough's suggestion and specify at SAMP protocol definition
> level that the strings serialization is in UTF-8 (in general), and specify at
> Standard Profile level that not all the UTF-8 chars are allowed but only
> those supported by XML?
This is a coherent suggestion and it could be done. However in my
opinion it's not the best way to go. While making the protocol as
general and flexible as possible sounds like a good thing, the price
that you pay is a reduction in interoperability. If the protocol
says that SAMP strings can only ever contain characters 0xA, 0xD and
0x20-7F (or whatever) then you know that if you can handle those
characters then you can definitely interoperate with anyone else
speaking the protocol. If the protocol says that any UTF-8 character
is permitted then someone trying to write middleware that does
translation between the far future perverted Ice-based profile and the
current Standard Profile will have a problem. Is that kind of
middleware something we're going to need? I don't know. But in
weighing up how we ought to plan for unknown future evolutions,
I would rather err on the side of safety than of flexibility.
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the apps-samp