Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: please consider using PCRE_DOLLAR_ENDONLY (and PCRE_DOTALL) for rematchpcre
- X-seq: zsh-workers 42319
- From: Stephane Chazelas <stephane.chazelas@xxxxxxxxx>
- To: Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx>
- Subject: Re: please consider using PCRE_DOLLAR_ENDONLY (and PCRE_DOTALL) for rematchpcre
- Date: Tue, 23 Jan 2018 06:57:35 +0000
- Cc: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>, zsh-workers@xxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=KCbx2GXsoYw97oj1ltjr5rnmcE8RmWYO/kcWaKYKV/g=; b=ZH8ZKlLI5kIAutw3i80MKQXTpFIxGHHXWai3RS/zy54UeVuE0IeLPXnKyavt9iH2wT M8Ak8UKZkaoLbWU95eYhJ3oHSQU7pvAkjbu2ewPm33e/UVtt3IlG6GHRBPWKJbGT0PmB 2FySwUxmnvvkqkuXsaM8PHbLjxZxZAXrUPv0bva8mkK3oabDx0jFaxvHDsWItc+0v1Y/ MBuO7X9+zEJNROs2HNN6z3sLKB9yxnoJTlX7g6WryO0xueS0ro5wgFp9U2SauJckam3K 15hl9MEf5M3/R4PhTJ/WQSNkm2/zKt7bfwe8OtjUe/3sI38ehnHrOKJ5p5MX7o+cKQ0P a2SA==
- In-reply-to: <20180122052829.GA83799@tower.spodhuis.org>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
- Mail-followup-to: Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx>, Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>, zsh-workers@xxxxxxx
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <20171122122519.GA13771@chaz.gmail.com> <20171122214025.GA2992@chaz.gmail.com> <180119234824.ZM7254@torch.brasslantern.com> <20180122052829.GA83799@tower.spodhuis.org>
2018-01-22 00:28:29 -0500, Phil Pennock:
[...]
> Changing the default behavior of valid semantics risks hard-to-debug
> breakage of existing scripts and I am erring on the side of being
> against this change. It's not hard opposition, but I'd like to see
> stronger justification before risking breaking changes.
>
> I know that I myself have scripts which rely upon PCRE matching against
> multiline data behaving as per the defaults of pcrepattern(3).
>
> In addition, while the DOTALL change can be turned off in-regex, the
> dollar-endonly one can't, AFAIK, so that becomes a breaking change which
> can't be worked around.
[...]
dollar-endonly is not really about multiline
[[ $'a\nb' =~ 'a$' ]]
will not match with or without it and
[[ $'a\nb' =~ '(?m)a$' ]]
will match with or without it.
It's more about single-line where the line delimiter happens to
be included (and you want the $ to match on the end of that line
as opposed to the end of the string).
$ matches before a trailing newline in a string in perl because
of how its <> operator works. perl is a text processing utility,
its regexps are primarily matched against single lines where the
newline is included (contrary to traditional text processing
utilities like sed/grep/awk where the record separator is not
included).
In:
perl -pe 's/.$//'
(which calls <>).
you want to remove the last character of the line, not the
newline character.
That $ behaviour makes a lot of sense there. Even if you use:
perl -lpe 's/.$//'
where that -l causes the delimiter to be removed on input and
added back on output like in sed/awk, that behaviour doesn't
harm because the record does *not* contain any newline
delimiter.
But zsh is not a text processing utility, and its "read" builtin
(the closest equivalent to perl's <>) does not include the
delimiter. It's actually hard to have a trailing newline when
processing text in shells given that $(...) strips them..
On the other hand, having
[[ $file =~ '\.txt$' ]]
match on files that don't end in .txt is a concern (and in my
experience, file names (as opposed to text lines with
delimiters) is the kind of thing I deal most often with in zsh).
And again, note that it only happens with pcrematch, it works as
expected with EREs.
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author