Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: processing of pipelines



On Aug 31, 11:41am, Sweth Chandramouli wrote:
> Subject: processing of pipelines
> 	i've been part of a recent discussion in the comp.unix.shell
> newsgroup about how different shells process pipelines, and thought i 
> should ask this group about zsh's behaviour.

Sven is probably the best person to give details of the internals, but:

> some quick tests first 
> led me to believe that zsh runs each command in a pipeline
> sequentially in the current process;

If you think about this for any length of time, you'll see that it is
impossible.  If the command is not a shell builtin, the shell MUST fork()
in order to exec() the external program.  Even if the command is a builtin,
having the same process both write and read from the opposite ends of the
same pipe is an invitation to deadlock [which is why `cmd` and $(cmd) also
fork, and I wish Sven luck inventing a way around it that doesn't involve
temporary files and just as much overhead as forking].

> some more research now makes me 
> think that this only appeared to be the case because i was testing
> using a no-op on one side of the pipe, and zsh somehow checks to see 
> if the command on the right side of a pipe is actually reading from
> the pipe; if not, it treats the pipe like a semicolon.

No, that's also impossible.  What zsh does in some circumstances is use a
second pipe between the parent and the child as a semaphore, to delay the
exec() until the parent has successfully entered the child in its process
table.  This avoids a potential race condition where the child may run to
completion and exit before the parent has even returned from the fork()
call.  It does mean that extremely tiny jobs start a bit slower in zsh.

> 	my new hypothesis, then, is that zsh (like ksh) runs all commands
> in a pipeline in sub-processes except for the last command, which is
> run in the current process

Once again, "run in the current process" is not possible except for shell
builtins.  Zsh *does* run the last command in a pipeline in the current
shell when the command *is* a builtin, even if that builtin is a loop,
which is AFAIK different from any other shell; it means that you can do
things like

    some external command | while read line; do export $line; done

and the current shell's environment will actually be modified.  Ksh would
have to use

    some external command > somefile
    . somefile

to get a similar effect.

> but that when the pipe isn't actually
> being used, it splits the pipeline up into smaller lists to be
> processed individually.  yes?  no?  something else entirely?

Something else entirely.

Here's a way to peek at the process tree:

alias -g child='perl -e '\''print @ARGV, ": ", getppid(), "-->", $$, "\n";
			while (<STDIN>) { print; }'\'

echo $$ | child A | child B | child C | child D
D: 5812-->5901
C: 5812-->5900
B: 5812-->5899
A: 5812-->5898
5812

Now try sticking an "exec" in the middle somewhere:

echo $$ | child A | exec child B | child C | child D
D: 5812-->5910
C: 5812-->5909
B: 5812-->5908
A: 5812-->5907
5812

Note that it made no difference; child B was already being exec()d.  Now
put the exec at the end (be sure you start a new shell to try this, or
you'll never see the output):

echo $$ | child A | child B | child C | exec child D
D: 5518-->5812
C: 5812-->5914
B: 5812-->5913
A: 5812-->5912
5812

Now 5812 has exited; it exec'd the last perl in the pipeline, replacing
the shell with it's child.

Note the slight difference when you wrap the whole thing in parens:

(echo $$ | child A | child B | child C | child D) 
D: 5942-->5963
C: 5963-->5967
B: 5963-->5966
A: 5963-->5965
5942

(echo $$ | child A | child B | child C | exec child D) 
D: 5942-->5968
C: 5968-->5972
B: 5968-->5971
A: 5968-->5970
5942

Zsh knows that it's safe to exec the last child when in a subshell, so it
does so even if you don't explicitly say "exec".

Now try it with "child" as a shell function:

unalias \child
child() { perl -e 'print @ARGV, ": ", getppid(), "-->", $$, "\n";
                        while (<STDIN>) { print; }' $* }

echo $$ | child A | child B | child C | child D
D: 5942-->6021
C: 6019-->6020
B: 6017-->6018
A: 6015-->6016
5942

echo $$ | child A | child B | child C | exec child D
D: 5942-->6029
C: 6027-->6028
B: 6025-->6026
A: 6023-->6024
5942

Note that each shell function got its own process, even when the last one
was to be exec'd (though zsh still exits after the last function finishes,
as if it really had done an "exec").  This is so that complex process
management within the shell function is handled by the function process,
while the parent zsh manages the surrounding pipeline.

Make sense?  BTW, there's a proposed patch to 3.1.6 that would change this
slightly in some circumstances, but the basic ideas are the same.



Messages sorted by: Reverse Date, Date, Thread, Author