Re: Inconsistent behavior of ERR_EXIT with conditionals

All of 1,3,4 are fixed by my patch in workers/50897

That's great! It's a small win but I actually (re-)stumbled on the problem of conditional expressions because of one of these cases. So it's good to see that they can be fixed.

My ultimate goal is to be able to run Zsh scripts with the guarantee that if any command unexpectedly fails (i.e., if any command whose result is not otherwise checked returns a non-zero exit status), then the whole script (not just some subshells) stops immediately. Wouldn't you agree that this would be a useful feature?

The question is how can this be achieved. On the surface, it looks like enabling ERR_EXIT does the trick. However there are several cases where ERR_EXIT fails to do the job. These cases are of two categories:

Non-triggering: In some contexts, commands whose result is not otherwise checked don't trigger a shell exit when they return a non-zero exit status even when ERR_EXIT is enabled, e.g., the "false" command in "{false; true} && true" doesn't trigger a shell exit.
Non-propagation: In some contexts, errors in subshells don't propagate to the parent shell, e.g., the "false" in "local var=$(false)" triggers an exit in the subshell of the command substitution but the assignment ignores the result of the command substitution and thus the parent shell fails to exit in turn.

I hoped that some of these cases could be "fixed" but I have now checked the POSIX specification and as you both pointed out, for most of them POSIX specifies that they have to work as they currently do (this doesn't include Lawrence's example 1,3,4, which should indeed be fixed).

The first developer is wrong. That's not what -e is for. A script
should be correct WITHOUT the use of -e ... the purpose of -e is to
uncover cases where the developer made a mistake, not to be an
integral part of the script function.

I can agree with that but consider that the developer's mistake was to use a ";" instead of an "&&" in the "backup" function. My broader point was that the same error (or developer mistake) in a function "foo" triggers an exit if "foo" is called from a plain statement but not if it's called from within a condition. Wouldn't you agree that it's unfortunate that the same error/mistake may or may not trigger an exit depending on whether it's executed outside or inside a condition?

Again, wrong. "{ false; true }" is a single statement because of the
braces. When that statement is followed by || the result of the
ENTIRE statement is considered to have been "checked".
Similarly, in "if false; true; then" the conditional part is
considered as a single statement whose result is "checked".

Indeed, POSIX states "The -e setting shall be ignored when executing the compound list following the while, until, if, or elif reserved word, a pipeline beginning with the ! reserved word, or any command of an AND-OR list other than the last.", so there is unfortunately no way this can be changed, at least in the context of ERR_EXIT.

Is all hope lost? Not necessarily. The non-propagation issues can be worked around. That's what my zabort does by configuring a ZERR trap that forcibly kills all parent shells from within the subshell where the error occurred. Unfortunately, I don't see how the non-triggering issues could be worked around. For these some change is needed in Zsh but I agree that changing the behavior of ERR_EXIT isn't the way to go as it should remain POSIX compliant. What could work is to implement a new shell option ERR_EXIT_STRICT, which triggers an exit on any command that returns a non-zero exit status and whose result isn't checked otherwise. Only one of ERR_EXIT and ERR_EXIT_STRICT could be enabled at any given time.

Would you agree to add a new shell option if it allows to run Zsh scripts such that if any command unexpectedly fails the script immediately stops (and its implementation doesn't require too complex changes)? If yes, I may look into implementing it.

Philippe