Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: [PATCH] Do not send duplicate signals when MONITOR is set



On Mon, Jun 7, 2021 at 10:28 AM Erik Paulson <epaulson10@xxxxxxxxx> wrote:
>
> I run emacs as a daemon and use the emacsclient program to connect to
> it. I noticed that when I suspended the emacsclient program and
> resumed it in zsh, the program would sporadically crash. After digging
> into the code, I realized that emacsclient was receiving two SIGCONTs,
> which caused it to send a malformed command to the daemon.
>
> I found that this return used to be present, but was removed in
> https://www.zsh.org/mla/workers/2018/msg01338.html while addressing
> another emacs issue.

I don't think it was removed ... similar code was added in two
separate places, but the "return" was only added in one of those.

Your patch adds that return in the second case.

The difference is that in the first case, the SIGCONT is received by a
job that is marked STAT_SUPERJOB and in the second case it's received
by a different job.

I believe this means that in the former case the superjob is in the
foreground and in the second case, it isn't -- rather one of its
subjobs is.  In the first instance zsh sends the signal to all the
subjobs and then to the process group.  In the second case it sends
the signal to the process group first and then falls into the loop
sending the signal to any subjobs that still appear to be stopped.

In any case I think a potential problem with placing an unconditional
"return" where your patch does, is that signals other than SIGCONT
probably still need to be delivered to the subjobs.  PWS, any input
here?

> Note that on testing with Linux, it seems the kernel will suppress the
> second signal; in order to get a test program to detect it, I have to
> step through the code with the debugger. On OSX, where I originally
> detected this problem, I reliably get two signals delivered each time.

This is probably a process scheduling difference rather than a signal
being suppressed, e.g., on Linux the order of events is
1) zsh sends signal to process group
2) process group copies signal to all processes
3) those processes resume
4) zsh proceeds into makerunning() and clears the STAT_STOPPED flag
5) that makes the loop a no-op

Whereas on OSX,
1) zsh sends signal to process group
2) zsh proceeds into makerunning() so STAT_STOPPED is left in place
3) process group copies signal to all processes
4) the loop sends a second SIGCONT
5) those processes resume and get a double SIGCONT

(2 & 3 might be simultaneous or in either order)




Messages sorted by: Reverse Date, Date, Thread, Author