Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: multi-byte text decoding error can break word splitting by read at least
- X-seq: zsh-workers 54689
- From: Mikael Magnusson <mikachu@xxxxxxxxx>
- To: Zsh hackers list <zsh-workers@xxxxxxx>
- Subject: Re: multi-byte text decoding error can break word splitting by read at least
- Date: Sat, 6 Jun 2026 23:03:03 +0200
- Arc-authentication-results: i=1; mx.google.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=GLTpML0cTqUlJp6YZNVc6PeWPEaO1uju0GfLIjBgtKk=; fh=BgAYDYpL6Ne/A5nWEMVJiHiBtrz8Imz3uf26RDwgQX4=; b=NWGQi9FxEoVKBRrg17yH6iwihNWb+31B9dZeafrFJ6yO8wStaa2v5XoFPS4voNRqrm /4OlXEq7k/5LcJTbW0HKyLcNojB7MPw+gb8REvXMY17IknLQv366xwadWxSX80wJYlL1 +aOXXdyNI8AqV39qwT17clKEcfArGzOukn+VwIXJw59D2SMmXvk096w6nwSH4wJ55hxT LwS1howGzY+1bfr5bmbgGHle7PWUinDLWkjvdoGUcQnv1B/gnZW+Th+SEyvD5u7LeiiL Wi4OE+OybfrK0eEb2S0Vn+HQQ0P1qQ041TybWRGQYjwTKkxKllkFMJqcWiTxID8jGuFw jFfA==; darn=zsh.org
- Arc-seal: i=1; a=rsa-sha256; t=1780779797; cv=none; d=google.com; s=arc-20240605; b=LZl/AtAomtU78LCmWHMeCS+GXNfP4JZTB+j36ITo8uKuRk8S9uoJYWZbskQzM1LZ/4 mb+Qx5Ut9ep/NOixNFF5s7p/9kHPl5d8zbBFDM6xbxafEX9KXJBzGnePDSAZ2moXkkxk XJlp1MsRQWbQFFyufXFGjO68/YyMa5LSQKatQJaK+VjEYcJjJXBeWA7AbqgxfMnYT7pa KnbgvSY1H/5DgGnLUG936lXT5xhiI9Frf3TSRgvfMVzU3hlZhU/9InjmX9m3C2RVyrN/ wV9464ldpbtzzw4qiu3ei9TYpudB7KJ9Spa5ArUgaESmYcazpovlxLhFeAa00m7kFi1D 2trg==
- Archived-at: <https://zsh.org/workers/54689>
- In-reply-to: <yzyhuwlykbdtojs6fsbyb6iynwri7pwe3wtk5rsgo52spni5ry@g5o5fzavdboh>
- List-id: <zsh-workers.zsh.org>
- References: <yzyhuwlykbdtojs6fsbyb6iynwri7pwe3wtk5rsgo52spni5ry@g5o5fzavdboh>
On Sun, Apr 27, 2025 at 5:41 PM Stephane Chazelas <stephane@xxxxxxxxxxxx> wrote:
>
> There was some recent bug report on the bash mailing list about
> "read" missing the delimiter when it followes a truncated
> character, but zsh has similar issues when it comes to do doing
> IFS splitting on the record once it has been read:
>
> $ print 'a\302×b' | IFS=× read -rA a; typeset a
> a=( $'a\M-B×b' )
>
> Wasn't split on ×.
>
> One might argue that doing reliable word splitting on non-text
> is illusory anyway, but note that the latest version of the
> POSIX standard now requires that splitting be done by looking
> for the byte encodings of the characters in $IFS which would
> make the behaviour above non-conformant.
>
> See https://www.austingroupbugs.net/view.php?id=1920 for some
> discussion on that though.
I think we can probably leave it to users to do this if they want to:
% sbread() { setopt localoptions nomultibyte; read "$@" }
% print ''a\302×b'' | IFS=× sbread -rA a
% typeset -p a'
typeset -a a=( $'a\M-B' '' b )
--
Mikael Magnusson
Messages sorted by:
Reverse Date,
Date,
Thread,
Author