Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
[bug] locale ctype not always honoured properly in pcre matching
- X-seq: zsh-workers 50653
- From: Stephane Chazelas <stephane@xxxxxxxxxxxx>
- To: Zsh hackers list <zsh-workers@xxxxxxx>
- Subject: [bug] locale ctype not always honoured properly in pcre matching
- Date: Tue, 20 Sep 2022 14:54:04 +0100
- Archived-at: <https://zsh.org/workers/50653>
- List-id: <zsh-workers.zsh.org>
- Mail-followup-to: Zsh hackers list <zsh-workers@xxxxxxx>
$ locale charmap
UTF-8
$ set -o rematchpcre
$ LC_ALL=C [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes
yes
OK, in C locale, those two bytes are considered as two characters.
$ [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes
$
OK, in UTF-8, those two bytes form one é character
$ LC_ALL=C [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes
$
Same command as above, but now it doesn't match (?!) and instead:
$ LC_ALL=C [ $'\xc3\xa9' '=~' '^.\z' ] && echo yes
yes
Behaves as if doing a match in UTF-8.
Same goes with:
$ PS1='$ ' zsh -f
$ set -o rematchpcre
$ (LC_ALL=C; [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes )
yes
$ [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes
$ (LC_ALL=C; [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes )
$
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author