Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: regexp-replace and ^, word boundary or look-behind operators

X-seq: zsh-workers 45054
From: Stephane Chazelas <stephane.chazelas@xxxxxxxxx>
To: Zsh hackers list <zsh-workers@xxxxxxx>
Subject: Re: regexp-replace and ^, word boundary or look-behind operators
Date: Mon, 16 Dec 2019 21:27:06 +0000
In-reply-to: <20191216211013.6opkv5sy4wvp3yn2@chaz.gmail.com>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
Mail-followup-to: Zsh hackers list <zsh-workers@xxxxxxx>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
References: <20191216211013.6opkv5sy4wvp3yn2@chaz.gmail.com>

2019-12-16 21:10:13 +0000, Stephane Chazelas:
> The way regexp-replace works means that these things:
> 
> $ a='aaab'; regexp-replace a '^a' x; echo "$a"
> xxxb
> $ a='abab'; regexp-replace a '\<ab' '<$MATCH>'; echo $a
> <ab><ab>
> $ set -o rematchpcre
> $ a=xxx; regexp-replace a '(?<!x)x' y; echo $a
> yyy
[...]

FWIW, looks like some sed implementations (like that of the
heirloom toolchest or busybox) or ksh93 have the same problem:

$ echo xxx | busybox sed 's/\<x/y/g'
yyy
$ a=xxx ksh -c 'echo ${a//~(E:^x)/y}'
yyy
$ a=xxx ksh -c 'echo ${a//[[:<:]]x/y}'
yyy

It may be that the POSIX regex API doesn't have a way to fix
that (REG_NOTBOL addresses the ^ case, but there's nothing about
\< / \b / [[:<]] which are non-POSIX extensions anyway).

PCRE should be OK, so it could be just a matter of
exposing it via the pcre_match builtin and document the
limitation otherwise for EREs (PCRE is the new de-facto standard
anyway).

-- 
Stephane

Follow-Ups:
- Re: regexp-replace and ^, word boundary or look-behind operators
  - From: Stephane Chazelas

References:
- regexp-replace and ^, word boundary or look-behind operators
  - From: Stephane Chazelas

Messages sorted by: Reverse Date, Date, Thread, Author