Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: bug in completion/expansion of files with LANG=C

X-seq: zsh-workers 22140
From: Wayne Davison <wayned@xxxxxxxxxxxxxxxxxxxxx>
To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
Subject: Re: bug in completion/expansion of files with LANG=C
Date: Sun, 8 Jan 2006 00:06:21 -0800
Cc: zsh-workers@xxxxxxxxxx
In-reply-to: <1060108055620.ZM15382@xxxxxxxxxxxxxxxxxxxxxxx>
Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
References: <20060106215829.GG10111@xxxxxxxxxxxxx> <20060107224447.GA30232@xxxxxxxxxxxxx> <1060108055620.ZM15382@xxxxxxxxxxxxxxxxxxxxxxx>

On Sun, Jan 08, 2006 at 05:56:20AM +0000, Bart Schaefer wrote:
> I prefer the \M- form because it gives you some clue what you should do
> to generate the equivalent value from the keyboard.

Fair enough -- let's just leave it alone, then.

As for my patch in the grandparent email, I noticed some problems with
it:  the manpage for mbrtowc() says that the state of the mbstate_t
object is undefined after the function returns -1, so the code should
reset it to a known state.  When the function returns -2, it means the
code scanned to the end of the string without finding the end of a wide
character, so perhaps we should treat all the remaining characters as
invalid?  I'm not certain that's the correct thing to do, so I'll leave
the code handling -2 the same way as -1 for now.  Finally, I wasn't
setting the right visible width for the \M-... string (I had mistakenly
hardwired it to "1").

While twiddling these things I noticed a couple other things that I
think could be improved:

1. It looks to me like the code in wcs_nicechar() that calls
wcswidth(&c, 1) could really just call wcwidth(c), right?  If not,
what am I missing?

2. The code in mb_niceformat() calls strlen() on the "fmt" string
returned by wcs_nicechar(), but it seems to me that it could just use
the width that wcs_nicechar() returned, right?

Attached is an updated version of my patch that fixes the aforementioned
bugs and implements the 2 improvements.

..wayne..

--- Src/utils.c	15 Dec 2005 14:51:41 -0000	1.108
+++ Src/utils.c	8 Jan 2006 07:55:56 -0000
@@ -375,7 +375,7 @@ wcs_nicechar(wchar_t c, size_t *widthp, 
     }
 
     if (widthp)
-	*widthp = (s - buf) + wcswidth(&c, 1);
+	*widthp = (s - buf) + wcwidth(c);
     if (swidep)
 	*swidep = s;
     for (mbptr = mbstr; ret; s++, mbptr++, ret--) {
@@ -3446,8 +3446,8 @@ niceztrlen(char const *s)
 mod_export size_t
 mb_niceformat(const char *s, FILE *stream, char **outstrp, int heap)
 {
-    size_t l = 0, newl, ret;
-    int umlen, outalloc, outleft;
+    size_t l = 0, outlen, outleft, ret;
+    int umlen, outalloc;
     wchar_t c;
     char *ums, *ptr, *fmt, *outstr, *outptr;
     mbstate_t ps;
@@ -3473,31 +3473,31 @@ mb_niceformat(const char *s, FILE *strea
     while (umlen > 0) {
 	ret = mbrtowc(&c, ptr, umlen, &ps);
 
-	if (ret == (size_t)-1 || ret == (size_t)-2)
-	{
-	    /*
-	     * We're a bit stuck here.  I suppose we could
-	     * just stick with \M-... for the individual bytes.
-	     */
-	    break;
-	}
-	/*
-	 * careful in case converting NULL returned 0: NULLs are real
-	 * characters for us.
-	 */
-	if (c == L'\0' && ret == 0)
+	if (ret != (size_t)-1 && ret != (size_t)-2) {
+	    /* Careful:  converting '\0' returns 0, but a '\0' is a
+	     * real character for us, so we should consume 1 byte. */
+	    if (c == L'\0')
+		ret = 1;
+
+	    fmt = wcs_nicechar(c, &outlen, NULL);
+	} else {
+	    /* Get ps out of its undefined state. */
+	    memset(&ps, 0, sizeof ps);
 	    ret = 1;
+
+	    /* The byte didn't convert, so output it as a \M-... sequence. */
+	    fmt = nicechar(*(unsigned char*)ptr);
+	    outlen = strlen(fmt);
+	}
+
 	umlen -= ret;
 	ptr += ret;
-
-	fmt = wcs_nicechar(c, &newl, NULL);
-	l += newl;
+	l += outlen;
 
 	if (stream)
 	    zputs(fmt, stream);
 	if (outstr) {
 	    /* Append to output string */
-	    int outlen = strlen(fmt);
 	    if (outlen >= outleft) {
 		/* Reallocate to twice the length */
 		int outoffset = outptr - outstr;

Follow-Ups:
- Re: bug in completion/expansion of files with LANG=C
  - From: Peter Stephenson

References:
- bug in completion/expansion of files with LANG=C
  - From: Wayne Davison
- Re: bug in completion/expansion of files with LANG=C
  - From: Wayne Davison
- Re: bug in completion/expansion of files with LANG=C
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author