Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
pattern matching on parts of characters
- X-seq: zsh-workers 54661
- From: Stephane Chazelas <stephane@xxxxxxxxxxxx>
- To: Zsh hackers list <zsh-workers@xxxxxxx>
- Subject: pattern matching on parts of characters
- Date: Tue, 2 Jun 2026 08:26:54 +0100
- Archived-at: <https://zsh.org/workers/54661>
- List-id: <zsh-workers.zsh.org>
- Mail-followup-to: Zsh hackers list <zsh-workers@xxxxxxx>
$ locale charmap
UTF-8
$ printf é | od -An -vtx1
c3 a9
$ [[ Stéphane = *$'\xc3'* ]]; echo $?
1
$ [[ Stéphane = *$'\xa9'* ]]; echo $?
1
$ [[ é = *$'\xc3'* ]]; echo $?
1
$ [[ é = *$'\xa9'* ]]; echo $?
1
$ [[ é = $'\xc3'* ]]; echo $?
1
$ [[ é = *$'\xa9' ]]; echo $?
0
That it can't find $'\xc3' nor $'\xa9' in Stéphane or é is fine
to me as it's meant to work at character level, but the result
of the last command when it finds a $'\xa9' /character/ at the
end of é is inconsistent and seems like a bug to me.
It gets worse in locales that use encodings such as GB18030
where you have characters whose encoding contains that of other
characters.
For instance, in a zh_CN.gb18030 locale:
$ LANG=zh_CN.gb18030 luit
$ locale charmap
GB18030
$ printf '媥' | od -tx1 -vAn
8b 78
$ [[ '媥' = *x ]]; echo $?
0
$ [[ '媥' = *x* ]]; echo $?
1
That's with 5.9 and current head of master branch.
From https://unix.stackexchange.com/questions/779151/to-check-whether-first-or-last-character-of-a-string-is-x/779410#779410
For the record, in bash, when either subject or pattern cannot
be decoded as text, pattern matching is done at byte level which
IMO is wrong so we wouldn't want to go there.
In bash, [[ Stéphane = *$'\xc3'* ]] returns true as *$'\xc3'*
cannot be decodes as text so matching is done at byte level, but
[[ $'Stéphane \200' = *[âû]* ]] also returns true as the meaning
of the pattern changes radically when it's interpreted at byte
level because the subject cannot be decoded as text (and it can
get worth, see
https://lists.gnu.org/archive/html/bug-bash/2021-02/msg00054.html
for details).
It's even worse in current versions of glibc's fnmatch():
https://sourceware.org/bugzilla/show_bug.cgi?id=31075
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author