String character range
m.b.taylor at bristol.ac.uk
Fri Aug 1 08:15:21 PDT 2008
On Fri, 1 Aug 2008, Carlos Rodrigo Blanco wrote:
> I'm sorry that I don't know much about unicode encoding and I feel quite
> ashamed of showing this ignorance, but I wonder what happens with latin
> characters and so.
> If I have to write, for instance, some author name in a xml document that
> includes some latin character (like ñ), is that allowed?
Writing it in an XML document - no problem. XML, and Unicode on which
it is based, is very capable at representing almost any character
from almost any language you can think of (and a lot more).
As far as SAMP goes: that character looks to me like code point 0xf1,
from the Latin-1 Supplement code block. So you could not send it
using either the existing definition for a SAMP string or the
proposal (4) that I am suggesting. If we used a variant of my
3. Define some escaping convention for un-XML characters, e.g. \u001f
for character 31.
with the intention that this escaping mechanism could be used for
any 8-bit character it would be possible to transmit this kind of
non-7-bit Latin character. However, characters with the 8th bit
set might cause problems for certain other transports and language
environments. I must admit apart from RFC-822 mail-type contexts
I can't think of what these might be, but I'd be inclined to steer
clear of non-7-bit characters just in case. However, if others
(e.g. with less Anglo-Saxon prejudices) think that it's an important
requirement to permit transmission of characters like this within
SAMP we could take that on board. We could even in principle say
that this escaping mechanism could be used to specify any Unicode
character - but I think that would definitely be a bad idea as it
would effectively restrict use of the protocol to languages with
Unicode support, which excludes quite a lot.
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the apps-samp