Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: zsh poor performances while reading and testing ?

X-seq: zsh-users 24004
From: Stephane Chazelas <stephane.chazelas@xxxxxxxxx>
To: Marc Chantreux <eiro@xxxxxxxxx>
Subject: Re: zsh poor performances while reading and testing ?
Date: Wed, 3 Jul 2019 15:28:31 +0100
Cc: Zsh Users <zsh-users@xxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=OYikPgnpq9lL3LG2mMxfmyzWb1q0v4xzMhXMB0PJfcg=; b=qtBoz+W8+n1YBUpWeCVdziDt4Puof5fPt21noMQuHqDttdiWgb+Gs44qR66sGRKQpV M+2bGGmXDUmOFyuQonvTctYgBpA4+ENHXdhkSVVy/8PYt+gCfDejhvMhFymn0vC6RfeE K7+F/elewQBSEbGzbwikX0lyQtC2IwQRJtYOzZJGEnT885kVC/mI1UoPcK/YenwgqIj9 9zSuSDdj2RnkhUof7eMRqlUQzKdn7dSY0ZFdeatQ5UWZLYZdFv3j4Ibam8qQncByR71B ZIVR1gInQOn3iL9Sp83FYR7sXWs85XSPaKAVYuFHQTqfo5OxmrPAlZeKxsPwEL2N9uEr +icw==
In-reply-to: <20190703135824.GA19289__20170.6622539618$1562162400$gmane$org@prometheus.u-strasbg.fr>
List-help: <mailto:zsh-users-help@zsh.org>
List-id: Zsh Users List <zsh-users.zsh.org>
List-post: <mailto:zsh-users@zsh.org>
List-unsubscribe: <mailto:zsh-users-unsubscribe@zsh.org>
Mail-followup-to: Marc Chantreux <eiro@xxxxxxxxx>, Zsh Users <zsh-users@xxxxxxx>
Mailing-list: contact zsh-users-help@xxxxxxx; run by ezmlm
References: <20190703135824.GA19289__20170.6622539618$1562162400$gmane$org@prometheus.u-strasbg.fr>

2019-07-03 15:58:24 +0200, Marc Chantreux:
[...]
> i recently made a benchmark to emphasize the gain of speed
> people can have using filters instead of pure shell loops.
> however i was surprised to see how slow zsh is compared to
> other shells when it comes to read data.
> 
> the interesting part of the benchmark is:
> 
>   for it (bash zsh ksh) {
>     TIMEFMT=": $it %U %S %E"
>     time $it  -c 'while read it; do : ; done < x > /dev/null'
>   }
> 
> : bash 4,95s 1,21s 6,18s
> : zsh 12,65s 28,12s 40,82s
> : ksh 9,87s 26,52s 36,42s
> 
> is there any obvious reason for that? is there a way to make
> it faster without diving in the C code?
[...]

read reads one byte at a time so as not to read past the newline
character. On seekable files bash and ksh implement an
optimisation whereby they read more than one byte (128 bytes in
bash IIRC) but seek back to the last newline.

ksh93 goes even further in that it remembers the excess bytes it
has read and reuses them for the next builtin commands, causing
this kind of bug: https://github.com/att/ast/issues/15

You'll probably find that they are all as inefficient for
non-seekable non-peekable input like pipes.

IIRC, ksh93 implements "|" with socketpair() instead of pipe()
on Linux so it can "peek" data to do this kind of optimisation

$ strace  ksh -c 'seq 20 | read'
[...]
socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 4]) = 0
shutdown(4, SHUT_RD)                    = 0
fchmod(4, 0200)                         = 0
shutdown(3, SHUT_WR)                    = 0
fchmod(3, 0400)                         = 0
[...]
fcntl(3, F_DUPFD, 0)                    = 0
[...]
recvfrom(0, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14"..., 65536, MSG_PEEK, NULL, NULL) = 51
read(0, "1\n", 2)                       = 2
[...]

Again that kind of optimisation can backfire and be invalid if
there's more than one reader to the pipe (not to mention the
problems on linux where /dev/fd/x doesn't work socketpairs).

By the way, the syntax to read a line is

IFS= read -r line

not

read line

even in zsh.

-- 
Stephane

Follow-Ups:
- Re: zsh poor performances while reading and testing ?
  - From: Marc Chantreux

Messages sorted by: Reverse Date, Date, Thread, Author