Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
zsh/bash behavior variance: regex ERE matching
- X-seq: zsh-workers 42458
- From: Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx>
- To: zsh-workers@xxxxxxx
- Subject: zsh/bash behavior variance: regex ERE matching
- Date: Tue, 13 Mar 2018 22:40:33 -0400
- Dkim-signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201802; h=Content-Type:MIME-Version:Message-ID:Subject:To: From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=jo/ihkrH86yuPTC+MYMPMYgiIJsXE9etIrGV6MZUjBc=; b=X2aiB7siSt3kidpoyhRhXSrPAj 3LNwY3kIlSC7k3x075Uz4NWG0F6loh3ElrnF15056FGuPnvXJL5T5jPmBYQLKJXXegEunvKqwrPX9 f1RjAeTgN5v9GbnIo4IDWLtJBFQfsmzoBvwqexqcqN1ZKyyqcW9HWCFnZVfOJVNuL0H4FIAFmvyfY 18gBwE3TA0LQscZpmlv3Oriwt/st;
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- Openpgp: url=https://www.security.spodhuis.org/PGP/keys/0x4D1E900E14C1CC04.asc
This is just to note that I have observed a behavior variance. My
proposed solution is to do absolutely nothing, and accept the variance
as "sane in an insane world".
Note that, per my standing practice, I do not cause risk to a code-base
which does not belong to me by reading GPL code of a related code-base,
so still have not read the bash code. (I like the GPL and use it
elsewhere, but Zsh isn't GPL and it's not my call to risk that, so I
stubbornly refuse to risk it). Descriptions of bash are based on
surmise from observed behavior.
Background: when bash copied the Perl-ish `=~` syntax, they declared it
to be an ERE match. When I saw that Bash had added the `=~` comparison
infix operator, I went "that's a good idea" and did likewise for Zsh;
during on-list discussion at the time, the core maintainers expressed a
preference for closer compatibility with Bash, so I wrote the
`zsh/regex` module to do ERE matching and introduced the `re_match_pcre`
option to let folks map `=~` onto our long-standing `-pcre-match` infix
operator. (I think Peter chose to make zsh/regex the default always,
which was very sane.)
Situation: on macOS (10.12.6, Sierrra), the regex library is based on
TRE, not on Henry Spencer's library or any other. Further, re_format(7)
documents a number of features for `REG_ENHANCED` mode, as distinct from
`REG_EXTENDED`. These are Perl-ish/PCRE-ish features such as `\d` for
`[[:digit:]]` and `(?:whatever)` for non-capturing grouping.
Using Zsh 5.4.2 built from Homebrew, which has no relevant patches, the
`=~` operator in Zsh is picking up features documented as `REG_ENHANCED`
when we only ask for `REG_EXTENDED`. Homebrew reports that zsh is:
Built from source on 2018-01-07 at 18:10:37 with: --with-unicode9 --with-gdbm --with-pcre
Specifically, the added features are the two features cited above,
`\d` and `(?:...)`.
So: we ask for ERE, we get ERE+nonstandard.
On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_
match with `REG_ENHANCED` features.
Best operating hypothesis is:
* Darwin userland bug
* Bash build process has logic to detect broken ERE in system libraries
and use a GNU ERE implementation (or ships with such always?) so that
it's immune from bugs like this
Proposed action: nothing
Reason: most folks aren't familiar enough with regexps to know the
variances and I suspect a non-trivial number of macOS users who are
unwittingly relying upon TRE REG_ENHANCED features. Fixing the
incompatibility (1) risks breaking working user scripts and (2) requires
shipping our own reliable ERE regexp library, and really I just don't
want to go there.
FWIW, somewhere lying around I also have a module which adds zsh/re2 as
a module, using Russ Cox's RE2 engine (as popularized by Go). I suspect
that this would cause more confusion than it would solve, and I think I
dropped it part-way through converting RE_MATCH_PCRE to a compatibility
shim which edits a zsh-specific parameter which defines the engine to be
used and so can be set to any of (regex, re2, pcre). If any of the core
team express interest, I can probably dust that off.
-Phil
Messages sorted by:
Reverse Date,
Date,
Thread,
Author