String character range
dtody at nrao.edu
Mon Aug 4 09:04:08 PDT 2008
Hi Mark -
Sure, I agree that the range of allowable chars should be restricted
as you suggest. My suggestion is to specify UTF-8, restricted as
has been discussed for 7-bit chars, but allowing UTF-8 encoded chars
to pass through. That would seem to do it and we still have simple
ASCII virtually all of the time so I don't think this will break
legacy code. If at some point full up unicode is needed (eg 16 bit
chars), that should be a different data type.
On Mon, 4 Aug 2008, Mark Taylor wrote:
> On Fri, 1 Aug 2008, Doug Tody wrote:
> > Hey Mark -
> > I agree with your sentiment that string data which we want to
> > manipulate in any language or environment should be simple; if
> > necessary a separate datatype could be declared for representing
> > e.g. general Unicode encoded text.
> > What about UTF-8 though? This is backwards compatible with ASCII
> > but allows any Unicode character to be represented using multi-byte
> > sequences - if there are no funny characters it is the same as ASCII.
> > This is much like your escape sequence proposal, but is a widely used
> > standard. XML has mandatory support for UTF-8 (almost any XML document
> > one sees is UTF-8 encoded) so there should be no problems there.
> Hi Doug,
> you're right, UTF-8 does look like a better solution than the \uxxxx
> escaping mechanism (borrowed from Java) that I suggested as far as
> transmitting things like accented letters and characters from non-Latin
> alphabets. However, it doesn't solve the problem which started this
> thread off, since you still won't be able to include characters in
> the ranges excluded by the XML Char definition; those are simply not permitted
> in an XML document, regardless of encoding (and in any
> case the UTF-8 encoding of 0x1f is the single byte 0x1f).
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the apps-samp