Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: A comment about "slurp" and -o multibyte
- X-seq: zsh-users 29511
- From: Roman Perepelitsa <roman.perepelitsa@xxxxxxxxx>
- To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
- Cc: Zsh Users <zsh-users@xxxxxxx>
- Subject: Re: A comment about "slurp" and -o multibyte
- Date: Wed, 17 Jan 2024 07:07:50 +0100
- Archived-at: <https://zsh.org/users/29511>
- In-reply-to: <CAH+w=7YpEjmzROcrOsqwJ+EMsa7dQUMFQKJoY7YqFC1VpBGtzQ@mail.gmail.com>
- List-id: <zsh-users.zsh.org>
- References: <CAH+w=7YpEjmzROcrOsqwJ+EMsa7dQUMFQKJoY7YqFC1VpBGtzQ@mail.gmail.com>
On Wed, Jan 17, 2024 at 4:46 AM Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
>
> On Sun, Jan 14, 2024 at 2:34 AM Roman Perepelitsa
> <roman.perepelitsa@xxxxxxxxx> wrote:
> >
> > function slurp() {
> > emulate -L zsh -o no_multibyte
> > [...]
> > typeset -g REPLY=${(j::)content}
> > }
>
> Although the function faithfully reads the input stream into $REPLY,
> later references to $REPLY with the multibyte option back in effect
> will (re-)interpret the content as multibyte characters. This may not
> be what's desired.
>
> % slurp < =zsh
> % () {
> print $#REPLY
> print ${(m)#REPLY}
> print ${(mm)#REPLY}
> setopt localoptions nomultibyte
> print $#REPLY
> }
> 872903 <-- number of characters
> 873259 <-- width of printable characters
> 872383 <-- number of glyphs
> 878288 <-- actual number of bytes
>
> (Of course those first three numbers are all garbage because it's just
> interpreting an executable as wide character text.)
To me this behavior looks as expected. It's consistent with `read`,
`sysread` and process substitution.
% head -c $((1 << 20)) </dev/urandom | tr '\0' x >1MB
% slurp <1MB
% IFS= read -rd '' read <1MB
% sysread -s $((1 << 20)) sysread <1MB
% procsubst=${"$(<1MB; print -n .)"%.}
% () {
print -r -- $#REPLY $#read $#sysread $#procsubst
setopt local_options no_multibyte
print -r -- $#REPLY $#read $#sysread $#procsubst
}
1008389 1008389 1008389 1008389
1048576 1048576 1048576 1048576
Roman.
Messages sorted by:
Reverse Date,
Date,
Thread,
Author