Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: bug in completion/expansion of files with LANG=C
- X-seq: zsh-workers 22140
- From: Wayne Davison <wayned@xxxxxxxxxxxxxxxxxxxxx>
- To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
- Subject: Re: bug in completion/expansion of files with LANG=C
- Date: Sun, 8 Jan 2006 00:06:21 -0800
- Cc: zsh-workers@xxxxxxxxxx
- In-reply-to: <1060108055620.ZM15382@xxxxxxxxxxxxxxxxxxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
- References: <20060106215829.GG10111@xxxxxxxxxxxxx> <20060107224447.GA30232@xxxxxxxxxxxxx> <1060108055620.ZM15382@xxxxxxxxxxxxxxxxxxxxxxx>
On Sun, Jan 08, 2006 at 05:56:20AM +0000, Bart Schaefer wrote:
> I prefer the \M- form because it gives you some clue what you should do
> to generate the equivalent value from the keyboard.
Fair enough -- let's just leave it alone, then.
As for my patch in the grandparent email, I noticed some problems with
it: the manpage for mbrtowc() says that the state of the mbstate_t
object is undefined after the function returns -1, so the code should
reset it to a known state. When the function returns -2, it means the
code scanned to the end of the string without finding the end of a wide
character, so perhaps we should treat all the remaining characters as
invalid? I'm not certain that's the correct thing to do, so I'll leave
the code handling -2 the same way as -1 for now. Finally, I wasn't
setting the right visible width for the \M-... string (I had mistakenly
hardwired it to "1").
While twiddling these things I noticed a couple other things that I
think could be improved:
1. It looks to me like the code in wcs_nicechar() that calls
wcswidth(&c, 1) could really just call wcwidth(c), right? If not,
what am I missing?
2. The code in mb_niceformat() calls strlen() on the "fmt" string
returned by wcs_nicechar(), but it seems to me that it could just use
the width that wcs_nicechar() returned, right?
Attached is an updated version of my patch that fixes the aforementioned
bugs and implements the 2 improvements.
..wayne..
--- Src/utils.c 15 Dec 2005 14:51:41 -0000 1.108
+++ Src/utils.c 8 Jan 2006 07:55:56 -0000
@@ -375,7 +375,7 @@ wcs_nicechar(wchar_t c, size_t *widthp,
}
if (widthp)
- *widthp = (s - buf) + wcswidth(&c, 1);
+ *widthp = (s - buf) + wcwidth(c);
if (swidep)
*swidep = s;
for (mbptr = mbstr; ret; s++, mbptr++, ret--) {
@@ -3446,8 +3446,8 @@ niceztrlen(char const *s)
mod_export size_t
mb_niceformat(const char *s, FILE *stream, char **outstrp, int heap)
{
- size_t l = 0, newl, ret;
- int umlen, outalloc, outleft;
+ size_t l = 0, outlen, outleft, ret;
+ int umlen, outalloc;
wchar_t c;
char *ums, *ptr, *fmt, *outstr, *outptr;
mbstate_t ps;
@@ -3473,31 +3473,31 @@ mb_niceformat(const char *s, FILE *strea
while (umlen > 0) {
ret = mbrtowc(&c, ptr, umlen, &ps);
- if (ret == (size_t)-1 || ret == (size_t)-2)
- {
- /*
- * We're a bit stuck here. I suppose we could
- * just stick with \M-... for the individual bytes.
- */
- break;
- }
- /*
- * careful in case converting NULL returned 0: NULLs are real
- * characters for us.
- */
- if (c == L'\0' && ret == 0)
+ if (ret != (size_t)-1 && ret != (size_t)-2) {
+ /* Careful: converting '\0' returns 0, but a '\0' is a
+ * real character for us, so we should consume 1 byte. */
+ if (c == L'\0')
+ ret = 1;
+
+ fmt = wcs_nicechar(c, &outlen, NULL);
+ } else {
+ /* Get ps out of its undefined state. */
+ memset(&ps, 0, sizeof ps);
ret = 1;
+
+ /* The byte didn't convert, so output it as a \M-... sequence. */
+ fmt = nicechar(*(unsigned char*)ptr);
+ outlen = strlen(fmt);
+ }
+
umlen -= ret;
ptr += ret;
-
- fmt = wcs_nicechar(c, &newl, NULL);
- l += newl;
+ l += outlen;
if (stream)
zputs(fmt, stream);
if (outstr) {
/* Append to output string */
- int outlen = strlen(fmt);
if (outlen >= outleft) {
/* Reallocate to twice the length */
int outoffset = outptr - outstr;
Messages sorted by:
Reverse Date,
Date,
Thread,
Author