Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Silent UTF-8 assumption?
- X-seq: zsh-workers 23413
- From: Peter Stephenson <pws@xxxxxxx>
- To: zsh-workers@xxxxxxxxxx
- Subject: Re: Silent UTF-8 assumption?
- Date: Thu, 10 May 2007 10:46:09 +0100
- In-reply-to: <200705101156.19776.arvidjaar@xxxxxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
- References: <200705101156.19776.arvidjaar@xxxxxxxxxx>
Andrey Borzenkov wrote:
> --nextPart1795203.6vxPbZfGLe
> Content-Type: text/plain;
> charset="us-ascii"
> Content-Transfer-Encoding: quoted-printable
> Content-Disposition: inline
>
> This caught my attention:
>
> static wchar_t
> charref(char *x, char *y)
> {
> wchar_t wc;
> size_t ret;
>
> if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80))
> return (wchar_t) STOUC(*x);
>
> well, this is definitely not valid for arbitrary multibyte character
> set.
We're not using an arbitrary character set, we're using one that has the
portable character set (i.e. ASCII) as a 7-bit subset, including the
property of UTF-8 that any true multibyte stream has the eighth bit set
in all octets. That's entirely for the practical reason that, if we
don't make that assumption, all hell will break use because we have to
make *every* part of the shell that ever tests a character, even an
ASCII character, multibyte aware.
There's a good chance the multibyte character set in question is UTF-8,
but it doesn't necessarily have to be.
--
Peter Stephenson <pws@xxxxxxx> Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070
To access the latest news from CSR copy this link into a web browser: http://www.csr.com/email_sig.php
To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview
Messages sorted by:
Reverse Date,
Date,
Thread,
Author