Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
PATCH: handle Stéphane correctly
- X-seq: zsh-workers 54688
- From: Mikael Magnusson <mikachu@xxxxxxxxx>
- To: zsh-workers@xxxxxxx
- Cc: Stephane Chazelas <stephane@xxxxxxxxxxxx>
- Subject: PATCH: handle Stéphane correctly
- Date: Sat, 6 Jun 2026 17:20:03 +0200
- Archived-at: <https://zsh.org/workers/54688>
- In-reply-to: <ah6A73LEdmJpFIFZ@chazelas.org>
- List-id: <zsh-workers.zsh.org>
- References: <ah6A73LEdmJpFIFZ@chazelas.org>
In the original report we had [[ é = *$'\xa9' ]] which has op == P_STAR,
in this case charstart is filled out inline, in the other cases it's
done by the patrepeat function. If ptlen > P_LS_LEN(next) we have
returned, if it is < then we have backtracked into the region where
charstart is valid and we can check that, and finally if they are equal
we know we are on a valid character boundary (but charstart might not be
valid here, in fact multiple tests fail if you leave out the ptlen
check).
Separately,
% [[ éé = é#$'\xa9' ]]; echo $?
pattern.c:4113: closure following more than one character
0
% [[ éé = (é)#$'\xa9' ]]; echo $?
1
so just treat multibyte single characters as "not simple", which fixes
that case too.
---
I inverted the tests in the tests so that the || print fail n pattern works,
otherwise we need to expect status 1 and then the test harness doesn't show the diff
of the output at all which defeats the purpose of printing which case failed.
Src/pattern.c | 17 +++++++++++++----
Test/D07multibyte.ztst | 17 +++++++++++++++++
2 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/Src/pattern.c b/Src/pattern.c
index 1e0ae88d99..017c20a1ea 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -1369,6 +1369,11 @@ patcomppiece(int *flagp, int paren)
}
}
slen = (patparse - str0) - nmeta;
+#ifdef MULTIBYTE_SUPPORT
+ if ((patglobflags & GF_MULTIBYTE) && slen > 1)
+ /* for multibyte single characters, treat x# as (x)# */
+ flags &= ~P_SIMPLE;
+#endif
/* First add length, which is a long */
patadd((char *)&slen, 0, sizeof(long), 0);
/*
@@ -3348,10 +3353,14 @@ patmatch(Upat prog)
return 0;
/* Yes, just position appropriately and test. */
patinput += ptlen - P_LS_LEN(next);
- /*
- * Here we will need to be careful that patinput is not
- * in the middle of a multibyte character.
- */
+#ifdef MULTIBYTE_SUPPORT
+ /* Make sure we aren't in the middle of
+ * a multibyte character */
+ if ((patglobflags & GF_MULTIBYTE) &&
+ ptlen < P_LS_LEN(next) &&
+ !charstart[patinput - start])
+ return 0;
+#endif
/* Continue loop with P_EXACTLY test. */
break;
}
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index b65912fe97..28cb9d907e 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -556,6 +556,23 @@ F:This is considered a bugfix in zsh
[[ $'\xe3\x83\x9b' = ? ]] || print fail 4
0:Testing incomplete and invalid multibyte character components
+ [[ é != *$'\xc3'* ]] || print fail 1
+ [[ é != *$'\xa9'* ]] || print fail 2
+ [[ Stéphane != *$'\xc3'* ]] || print fail 3
+ [[ Stéphane != *$'\xa9'* ]] || print fail 4
+ [[ é != $'\xc3'* ]] || print fail 5
+ [[ é != *$'\xa9' ]] || print fail 6
+0:Raw bytes don't match multibyte characters in * patterns
+
+ [[ éé != é#$'\xa9' ]] || print fail 1
+ [[ é != [aé]#$'\xa9' ]] || print fail 2
+ [[ é != [^x]#$'\xa9' ]] || print fail 3
+ [[ aé != [aé]#$'\xa9' ]] || print fail 4
+ [[ aé != [^x]#$'\xa9' ]] || print fail 5
+ [[ aé != [aé]##$'\xa9' ]] || print fail 6
+ [[ aé != [^x]##$'\xa9' ]] || print fail 7
+0:Raw bytes don't match multibyte characters part 2
+
print -r -- ${(q+):-ホ}
foo='She said "ホ". I said "You can'\''t '\''ホ'\'' me!'
print -r -- ${(q+)foo}
--
2.38.1
Messages sorted by:
Reverse Date,
Date,
Thread,
Author