Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Quoting problem and crashes with ${(#)var}
- X-seq: zsh-workers 23173
- From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
- To: zsh-workers@xxxxxxxxxx
- Subject: Re: Quoting problem and crashes with ${(#)var}
- Date: Tue, 13 Feb 2007 23:48:14 -0800
- In-reply-to: <200702132111.l1DLB5rA003849@xxxxxxxxxxxxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
- References: <200702132111.l1DLB5rA003849@xxxxxxxxxxxxxxxxx>
On Feb 13, 9:11pm, Peter Stephenson wrote:
}
} Bart Schaefer wrote:
} > I'm a bit puzzled, given this test ...
} >
} > } if (isset(MULTIBYTE) && ires > 127) {
} >
} > ... why ${(V)x} for x in 128 through 159 display as \u0080 through
} > \u009f, but then 160 through 255 are treated as directly printable.
}
} On my terminal, I've got different effects, which worries me more: if I
} assign the UTF-8 representation of character 128 to a variable, ${(V)x}
} tries to print it out directly (and it only shows up if send it through
} xxd or equivalent).
Did you remember to use "print -R"? If I do
print ${(V)x}
then print interprets the \u0080 sequence and send a raw byte. That
doesn't happen with
print -R ${(V)x}
} (However, the ZLE function insert-unicode-char correctly
} shows it as control character, ^ followed by A with a grave accent.)
That's what I expected ${(V)x} to do, but instead it displays it as a
\u escape.
} > % for x in {1..254}; h[x]=${(V#)x}
} > zsh: character not in range
} >
} > That seems wrong.
}
} Well, because you've (explicitly or otherwise) got it set to a locale
} with no knowledge of characters beyond 127; it only knows about the
} portable character set. It's simply telling you it doesn't know what to
} do with them.
}
} What you're asking is for some kludged special case for LANG=C
Well, no, I'm not. I'm asking for two things:
(1) when "character not in range" we don't treat it as a fatal error
and bail out of the whole surrounding loop; and
(2) regardless of the locale, single-byte values should always be
convertible to something "viewable", either \u00xy or \M-c.
There might be cases where "character not in range" is a fatal error,
but this doesn't seem as though it ought to be one of them.
--- /tmp/subst.c 2007-02-13 23:44:46.000000000 -0800
+++ /tmp/subst.c5229YwW 2007-02-13 23:44:46.000000000 -0800
@@ -1193,18 +1193,22 @@
substevalchar(char *ptr)
{
zlong ires = mathevali(ptr);
- int len;
+ int len = 0;
if (errflag)
return NULL;
#ifdef MULTIBYTE_SUPPORT
if (isset(MULTIBYTE) && ires > 127) {
+ int one = noerrs;
char buf[10];
/* inefficient: should separate out \U handling from getkeystring */
sprintf(buf, "\\U%.8x", (unsigned int)ires);
+ noerrs = 1;
ptr = getkeystring(buf, &len, GETKEYS_BINDKEY, NULL);
- } else
+ noerrs = one, errflag = 0;
+ }
+ if (len == 0)
#endif
{
ptr = zhalloc(2);
--
Messages sorted by:
Reverse Date,
Date,
Thread,
Author