Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Deadlock when receiving kill-signal from child process
- X-seq: zsh-workers 35984
- From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
- To: zsh-workers@xxxxxxx
- Subject: Re: Deadlock when receiving kill-signal from child process
- Date: Tue, 4 Aug 2015 23:53:59 -0700
- In-reply-to: <CA+=GgY7uGzCYEKLBzqrt=ct6q72WFC5w1jMB5RDNe60J-wUz=Q@mail.gmail.com>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <CA+=GgY7mHkyK4NJQ6m7y-HpVPKOuKx3-bkJqRHriKzZ662_iwA@mail.gmail.com> <150803085228.ZM24837@torch.brasslantern.com> <CA+=GgY5iZfgUag_V1jqmCv4=PUGBmaV2cNWTDjSO4DAZ+zm-iQ@mail.gmail.com> <150803135818.ZM24977@torch.brasslantern.com> <CA+=GgY7uGzCYEKLBzqrt=ct6q72WFC5w1jMB5RDNe60J-wUz=Q@mail.gmail.com>
On Aug 5, 12:52am, Mathias Fredriksson wrote:
}
} I have however managed to get a dump with strace on Gentoo
Based on this strace plus a GDB stack trace Mathias sent me off-list,
I think the problem may be here:
1415 zwaitjob(int job, int wait_cmd)
1416 {
1417 int q = queue_signal_level();
1418 Job jn = jobtab + job;
1419
1420 dont_queue_signals();
1421 child_block(); /* unblocked during signal_suspend() */
1422 queue_traps(wait_cmd);
...
1440 while (!errflag && jn->stat &&
1441 !(jn->stat & STAT_DONE) &&
1442 !(interact && (jn->stat & STAT_STOPPED))) {
1443 signal_suspend(SIGCHLD, wait_cmd);
I suspect what's happening is that the child represented by "job" exits
during dont_queue_signals(), which is a macro that expands to a loop
calling zhandler(), which will process TRAPUSR1 (or other traps).
Somehow this results in jn->stat never being marked STAT_DONE. Perhaps
this happens because the "thisjob" global gets temporarily changed in
the TRAP* function? Anyway signal_suspend(SIGCHLD, wait_cmd) is then
called when there are no children left, so we never receive another
SIGCHLD to break out of the while-loop, and even if we do come out of
signal_suspend() the while-loop goes around and we block again.
I'm not sure what to do if this is in fact the problem, because it
e.g. calling child_block() is before dont_queue_signals() has other
problems.
However, it's also possible that a child has exited even before its
job table entry has been created. One way to find out if that has
happened is this patch:
diff --git a/Src/signals.c b/Src/signals.c
index 3950ad1..d72c7d6 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -519,6 +519,7 @@ wait_for_processes(void)
* will get added on to the next found process that
* terminates.
*/
+ zwarn("no job table entry for pid %d", pid);
get_usage();
}
/*
Mathias, if you could apply that patch and try again to reproduce the
deadlock, it might tell us something.
--
Barton E. Schaefer
Messages sorted by:
Reverse Date,
Date,
Thread,
Author