Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: multi-byte text decoding error can break word splitting by read at least



On Sun, Apr 27, 2025 at 5:41 PM Stephane Chazelas <stephane@xxxxxxxxxxxx> wrote:
>
> There was some recent bug report on the bash mailing list about
> "read" missing the delimiter when it followes a truncated
> character, but zsh has similar issues when it comes to do doing
> IFS splitting on the record once it has been read:
>
> $ print 'a\302×b' | IFS=× read -rA a; typeset a
> a=( $'a\M-B×b' )
>
> Wasn't split on ×.
>
> One might argue that doing reliable word splitting on non-text
> is illusory anyway, but note that the latest version of the
> POSIX standard now requires that splitting be done by looking
> for the byte encodings of the characters in $IFS which would
> make the behaviour above non-conformant.
>
> See https://www.austingroupbugs.net/view.php?id=1920 for some
> discussion on that though.

I think we can probably leave it to users to do this if they want to:
% sbread() { setopt localoptions nomultibyte; read "$@" }
% print ''a\302×b'' | IFS=× sbread -rA a
% typeset -p a'
typeset -a a=( $'a\M-B' '' b )

-- 
Mikael Magnusson




Messages sorted by: Reverse Date, Date, Thread, Author