Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: zsh/bash behavior variance: regex ERE matching
- X-seq: zsh-workers 42462
- From: Stephane Chazelas <stephane.chazelas@xxxxxxxxx>
- To: Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx>
- Subject: Re: zsh/bash behavior variance: regex ERE matching
- Date: Wed, 14 Mar 2018 14:37:24 +0000
- Cc: zsh-workers@xxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=/S2gIPcbdD4rEzESIYLw7JxteTvtA2/5nfkGI55cYOo=; b=jKJUIgCDEAXTukWtOo95z4NtQVkLNqid7gUdFX+ULPcYlqNySG/Q6XGJ5GLZAqEhmX 0hokKB83yXRFlJ9OQhgrJ14syxGDbs3RoBKxx1PL18B5Ai+G4RBPzxNam7bRai+rzmW/ 3kOsvyRk/c8aG/pRpp6S06TkeseswlVteRhXYPr67pW54eaBHeT/KWgudw207Acj1XAS Cj/DClenSkIyZCamDmVjer5StzhT4idYR4Lo8xAyznOe8fMtcyITi+hbYU9ye+C6T/qJ TzGI7fKbw+WWlWt6iXn67X2+wh3OuylvHUjtGeljEjwXeoEiUCHMHk9HvCbnZckOluce sUDg==
- In-reply-to: <20180314024032.GB32722@tower.spodhuis.org>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
- Mail-followup-to: Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx>, zsh-workers@xxxxxxx
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <20180314024032.GB32722@tower.spodhuis.org>
2018-03-13 22:40:33 -0400, Phil Pennock:
[...]
> So: we ask for ERE, we get ERE+nonstandard.
>
> On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_
> match with `REG_ENHANCED` features.
[...]
An important note about how bash's =~ works since 3.2 (in 3.1
or with the compat31 option it works more like zsh):
In bash (and to some extent in ksh93 as well though it's very
buggy there), the shell quoting operators have an influence on
the regex matching like it does for shell wildcards.
[[ a =~ "." ]] or [[ a =~ \. ]]
actually call regcomp() with a "\." regexp.
To do that, bash needs to parse the regexp and does it using the
POSIX ERE syntax. In
[[ a =~ \d ]] there is the same as [[ a =~ "d" ]] and calls
regcomp() with "d" while for [[ a =~ '\d' ]], it calls it with
"\\d" (the "\" being shell-quoted results in it being
regexp-escaped).
That means that if you want to use extensions, you need to use
variables or other expansions there (which you leave unquoted).
Like:
re='\d'
[[ a =~ $re ]]
for regcomp() to be called with "\d".
Note that (?:...) and \d are fine. We're not breaking EREs by
supporting it as the behaviour for (?:...) and \d is unspecified
in the POSIX ERE specification.
Other regexp implementations have other backward-compatible
extensions. For instance, GNU EREs support \b, \<, \>...
Some incompatibilities I'm aware of between ERE and PCRE (I
don't know if that also applies to those macOS REs):
- In POSIX ERE, [\d] matches on \ and d while it matches on a
digit in PCRE (see also [\]] and co)
- in POSIX ERE, alternation looks for the longest match, while
PCRE the leftmost one that matches.
$ echo abc | grep -oE 'a|ab'
ab
$ echo abc | grep -oP 'a|ab'
a
$ [[ abc =~ '(a|ab)' ]]; echo $match
ab
$ setopt rematchpcre
$ [[ abc =~ '(a|ab)' ]]; echo $match
a
As long as the regex library does what is required for POSIX
compliant regular expressions, since we document that =~ does
POSIX ERE, I'd say it doesn't matter what extension are
implemented over the standard.
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author