Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

PATCH: handle Stéphane correctly



In the original report we had [[ é = *$'\xa9' ]] which has op == P_STAR,
in this case charstart is filled out inline, in the other cases it's
done by the patrepeat function. If ptlen > P_LS_LEN(next) we have
returned, if it is < then we have backtracked into the region where
charstart is valid and we can check that, and finally if they are equal
we know we are on a valid character boundary (but charstart might not be
valid here, in fact multiple tests fail if you leave out the ptlen
check).

Separately,
% [[ éé = é#$'\xa9' ]]; echo $?
 pattern.c:4113: closure following more than one character
0
% [[ éé = (é)#$'\xa9' ]]; echo $?
1
so just treat multibyte single characters as "not simple", which fixes
that case too.

---

I inverted the tests in the tests so that the || print fail n pattern works,
otherwise we need to expect status 1 and then the test harness doesn't show the diff
of the output at all which defeats the purpose of printing which case failed.

 Src/pattern.c          | 17 +++++++++++++----
 Test/D07multibyte.ztst | 17 +++++++++++++++++
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/Src/pattern.c b/Src/pattern.c
index 1e0ae88d99..017c20a1ea 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -1369,6 +1369,11 @@ patcomppiece(int *flagp, int paren)
 	    }
 	}
 	slen = (patparse - str0) - nmeta;
+#ifdef MULTIBYTE_SUPPORT
+	if ((patglobflags & GF_MULTIBYTE) && slen > 1)
+	    /* for multibyte single characters, treat x# as (x)# */
+	    flags &= ~P_SIMPLE;
+#endif
 	/* First add length, which is a long */
 	patadd((char *)&slen, 0, sizeof(long), 0);
 	/*
@@ -3348,10 +3353,14 @@ patmatch(Upat prog)
 			    return 0;
 			/* Yes, just position appropriately and test. */
 			patinput += ptlen - P_LS_LEN(next);
-			/*
-			 * Here we will need to be careful that patinput is not
-			 * in the middle of a multibyte character.
-			 */
+#ifdef MULTIBYTE_SUPPORT
+			/* Make sure we aren't in the middle of
+			 * a multibyte character */
+			if ((patglobflags & GF_MULTIBYTE) &&
+			    ptlen < P_LS_LEN(next) &&
+			    !charstart[patinput - start])
+			    return 0;
+#endif
 			/* Continue loop with P_EXACTLY test. */
 			break;
 		    }
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index b65912fe97..28cb9d907e 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -556,6 +556,23 @@ F:This is considered a bugfix in zsh
   [[ $'\xe3\x83\x9b' = ? ]] || print fail 4
 0:Testing incomplete and invalid multibyte character components
 
+  [[ é != *$'\xc3'* ]]        || print fail 1
+  [[ é != *$'\xa9'* ]]        || print fail 2
+  [[ Stéphane != *$'\xc3'* ]] || print fail 3
+  [[ Stéphane != *$'\xa9'* ]] || print fail 4
+  [[ é != $'\xc3'* ]]         || print fail 5
+  [[ é != *$'\xa9' ]]         || print fail 6
+0:Raw bytes don't match multibyte characters in * patterns
+
+  [[ éé != é#$'\xa9' ]]     || print fail 1
+  [[ é != [aé]#$'\xa9' ]]   || print fail 2
+  [[ é != [^x]#$'\xa9' ]]   || print fail 3
+  [[ aé != [aé]#$'\xa9' ]]  || print fail 4
+  [[ aé != [^x]#$'\xa9' ]]  || print fail 5
+  [[ aé != [aé]##$'\xa9' ]] || print fail 6
+  [[ aé != [^x]##$'\xa9' ]] || print fail 7
+0:Raw bytes don't match multibyte characters part 2
+
   print -r -- ${(q+):-ホ}
   foo='She said "ホ".  I said "You can'\''t '\''ホ'\'' me!'
   print -r -- ${(q+)foo}
-- 
2.38.1





Messages sorted by: Reverse Date, Date, Thread, Author