Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

PATCH: Rimmerworld pipeline race

X-seq: zsh-workers 43408
From: Peter Stephenson <p.stephenson@xxxxxxxxxxx>
To: Zsh hackers list <zsh-workers@xxxxxxx>
Subject: PATCH: Rimmerworld pipeline race
Date: Fri, 7 Sep 2018 15:01:40 +0100
Cms-type: 201P
Dkim-filter: OpenDKIM Filter v2.11.0 mailout2.w1.samsung.com 20180907140144euoutp0233f5ce4ceaef2866a58b2f443ab5f0ea~SIv_0q-690558405584euoutp02K
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1536328904; bh=GeOKKmJnd7w5vPvjntmUKHhy1FNKtqz89k+Nkc//AEE=; h=Date:From:To:Subject:In-Reply-To:References:From; b=JSf4/j2G3REYwSVnA4qJp8kVnzoZJ5AoyoLQlnBcXFxoEPiAnf/uYvXUrI8IzyDFD dxAxlnKTsMLxqrj7xq8Lq8MTsVFxaXOFxHp3eU2fIlLSTHNSA0+asFpG7xszPTHeZa Cf6wgITafYUxTiuWsrk9InefkVl2EAg2M7mrUQIY=
In-reply-to: <20180907122145.2af5bcba@camnpupstephen.cam.scsc.local>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
Organization: SCSC
References: <CGME20180905200816epcas5p18ce6c49baa637e7d83a769e97c4364fb@epcas5p1.samsung.com> <20180905210740.5a6aec15@pws-HP.localdomain> <20180906090902.1f344e9f@camnpupstephen.cam.scsc.local> <20180906092250eucas1p13d651e07ae627d179dd0701e65f912d6~RxTLztJrP2263722637eucas1p12@eucas1p1.samsung.com> <CAH+w=7ZH1d8RZaHNVhb2k+RqBb_8kgxDurewtk1a6TWG9Vvh5Q@mail.gmail.com> <20180907101852.62415ff9@camnpupstephen.cam.scsc.local> <20180907122145.2af5bcba@camnpupstephen.cam.scsc.local>

OK, this is *much* closer and possibly correct.

When adding a new process to the list in the main shell, we looked to
see if the group leader for that job was empty and if so marked the new
process as the group leader.

This conflicts with entersubsh(), where there are hairy list-pipe tests
to determine what will actually be used as the process group leader
(look in the block controlled by the ESUB_PGRP test --- that's where I
was fiddling in the last patch, but this bit isn't the one that's
wrong).

Most of the time this isn't a problem, but if we decide in the main
shell that that process exited while that group was in control of the terminal,
the main shell will think it's time for it to reclaim the terminal.
Whatever happens to be running in the foreground will then find it's
been pushed into the background.

This showed up in this case

  echo help | { cat | more }

because

- The echo had set the process group in a list_pipe_job fashion.

- The cat was in that process group and the bogus logic made the parent
shell think it had its own process group.

- The more put itself in the same process group, too.

- The cat exited.  The following logic

		if (WIFEXITED(status) &&
		    pn->pid == jn->gleader &&
		    killpg(pn->pid, 0) == -1) {
		    jn->gleader = 0;
		    if (!(jn->stat & STAT_NOSTTY)) {
			/*
			 * This PID was in control of the terminal;
			 * reclaim terminal now it has exited.
			 * It's still possible some future forked
			 * process of this job will become group
			 * leader, however.
			 */
			attachtty(mypgrp);
		    }
		}

  then triggered, grabbing back the terminal.  The more found itself
  without a terminal.

The fix is to second-guess the test the process itself makes when
setting the group leader and to add some clear comments.  Then
jn->gleader in the hunk above correctly reflects the actual foreground
process group, so the shell doesn't reclaim the terminal yet.

Neither of the two related bits, in entersubsh() or addproc(), is new,
increasing my suspicion that the problem isn't fundamentally new but a
newly exposed race.

It's hard to think how fixing this can make things worse (although I
always say this).  How about committing it, making a new test build,
encouraging EVERYONE to try it out, and seeing if that is releasable?

pws


diff --git a/Src/exec.c b/Src/exec.c
index 09ee132..8062f0c 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -994,6 +994,31 @@ enum {
     ESUB_JOB_CONTROL = 0x40
 };
 
+
+/*
+ * If the list_pipe magic is in effect, its process group controls
+ * the terminal: return this.  Otherwise, return -1.
+ *
+ * C.f. the test in entersubsh() for process group of forked subshell.
+ */
+
+/**/
+int
+get_list_pipe_process_group(void)
+{
+    int gleader;
+    if (!list_pipe && !list_pipe_child)
+	return -1;
+    gleader = jobtab[list_pipe_job].gleader;
+
+    if (gleader &&
+	killpg(gleader, 0) != -1 &&
+	gettygrp() == gleader)
+	return gleader;
+
+    return -1;
+}
+
 /**/
 static void
 entersubsh(int flags)
diff --git a/Src/jobs.c b/Src/jobs.c
index 38b3d89..c841150 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -1392,10 +1392,21 @@ addproc(pid_t pid, char *text, int aux, struct timeval *bgtime)
     if (!aux)
     {
 	pn->bgtime = *bgtime;
-	/* if this is the first process we are adding to *
-	 * the job, then it's the group leader.          */
-	if (!jobtab[thisjob].gleader)
-	    jobtab[thisjob].gleader = pid;
+	/*
+	 * If this is the first process we are adding to
+	 * the job, then it's the group leader.
+	 *
+	 * Exception: if the list_pipe_job pipeline madness is in
+	 * effect, we need to use the process group that's already
+	 * controlling the terminal or we'll attach to something
+	 * bogus when that process exits.  This test mimics the
+	 * test the process itself makes in entersubsh(), so the
+	 * two should be kept in sync.
+	 */
+	if (!jobtab[thisjob].gleader) {
+	    int gleader = get_list_pipe_process_group();
+	    jobtab[thisjob].gleader = (gleader == -1) ? pid : gleader;
+	}
 	/* attach this process to end of process list of current job */
 	pnlist = &jobtab[thisjob].procs;
     }

References:
- PATCH: job control debug
  - From: Peter Stephenson
- Re: PATCH: job control debug
  - From: Peter Stephenson
- Re: PATCH: job control debug
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author