Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: D07multibyte.ztst failure on HP-UX 11.11
On Wed, 6 May 2009 21:50:26 +0000
Paul Ackersviller <pda@xxxxxxxxxxxxxxxx> wrote:
> On Wed, May 06, 2009 at 08:22:06PM +0100, Peter Stephenson wrote:
> > On Tue, 5 May 2009 19:39:31 +0000
> > Paul Ackersviller <pda@xxxxxxxxxxxxxxxx> wrote:
> > > I can get read to silently fail on the HP box with
> > >
> > > env -i LANG=en_US.utf8 ../Src/zsh -fc \
> > > "(LC_ALL=C; print \$'\\u00e9') | read || print failure"
>
>> > Taking out the LC_ALL should produce some sensible output if you omit
> > the read. (Replacing it with xxd or failing that od -x might make it
> > clearer what's going on.)
>
> Not quite: "zsh:1: cannot do charset conversion (iconv failed)"
It's not clear why it should fail, but the error message is OK and allowed
for by the test.
> > If you're simply taking out the subshell and not replacing it with
> > anything then the LC_ALL=C covers the "read" as well as the "print".
> > So possibly something strange is happening in the read. Replacing it
> > with xxd might be even more instructive here.
>
> This gives
> 0000000 c50a
> Does this mean the 0a should be the second byte, but is perhaps being
> interpreted as newline?
So this comes from
env -i LANG=en_US.utf8 ../Src/zsh -fc \
"LC_ALL=C; print \$'\\u00e9' | read || print failure"
I get "character not in range" here. It looks like your system is
outputting 0xc5, which I wouldn't expect to be a valid character in the C
locale, and I can't work out why it comes from Unicode character 0xe9. The
UTF-8 would be 0xc3a9, the ISO-8859-1 or -15 would be 0xe9.
The 0x0a really is a newline.
In the test you show, read is running with UTF-8. I can confirm that
on my system (where I happen already to be in the en_GB.UTF-8 locale)
(unsetopt multibyte; print $'\xc5') | xxd
gives what you're sending to read, and
(unsetopt multibyte; print $'\xc5') | read
returns status 1 with no output.
So this all tallies, and I think we've found out all we need, but I'm not
sure about the fix; possibly read should output an error on an invalid
character in MULTIBYTE mode (which we could add to the test)? Does anyone
see a problem with that?
I'm fairly happy this isn't a shell bug, but I'd still like the shell to
have enough facilities to be able to detect the problem.
--
Peter Stephenson <pws@xxxxxxx> Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070
Messages sorted by:
Reverse Date,
Date,
Thread,
Author