Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: zsh/pcre has errors with unicode bytes



정누리 wrote on Mon, 13 Jul 2020 11:53 +0900:
> $ LC_ALL='C'
> $ str='Hi😊'
> $ for (( i = 1; i <= ${#str}; ++i )); do                     
>       byte="$str[i]"                  
>       [[ $byte -pcre-match [a-zA-Z0-9] ]] && echo $byte || echo 'no match'
>   done
> >> H  
>    i
>    zsh: pcre_exec() error [-10]

From /usr/include/pcre.h on my system:

#define PCRE_ERROR_BADUTF8         (-10)  /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF16        (-10)  /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF32        (-10)  /* Same for 8/16/32 */

So pcre expects the pattern to be a Unicode string, despite the locale.

Actually, wait.  We don't know what the locale is.  I don't build PCRE,
but could you try that again with «export LC_ALL='C'» at the start?

If that doesn't force it to use ASCII, try unsetting the MULTIBYTE
option.  See zpcre_utf8_enabled() (in Src/Modules/pcre.c).

Cheers,

Daniel


>    no match
>    zsh: pcre_exec() error [-10]
>    no match
>    zsh: pcre_exec() error [-10]
>    no match
>    zsh: pcre_exec() error [-10]
>    no match
> 
> Thanks for reading.



Messages sorted by: Reverse Date, Date, Thread, Author