Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: please consider using PCRE_DOLLAR_ENDONLY (and PCRE_DOTALL) for rematchpcre



2017-11-22 12:25:19 +0000, Stephane Chazelas:
[...]
> It can be worked around ([[ $a =~ 'a\z' ]], [[ $a =~ '(?s).'
> ]]), but IMO at least PCRE_DOLLAR_ENDONLY (if not PCRE_DOTALL)
> should be the default at least for [[ $string =~ ... ]] as
> in shells, $string usually do not include the newline delimiter.
[...]

The situation in other tools languages:

ksh93:

$ ksh93 -c "[[ $'a\n' = ~(P:a$) ]] || echo no; [[ $'\n' = ~(P:.) ]] && echo yes"
no
yes


(both PCRE_DOLLAR_ENDONLY and PCRE_DOTALL (or equivalent as
ksh93 comes with its own pcre-like implementation))

$ php -r 'echo preg_match("/a$/", "a\n") . "\n" . preg_match("/./", "\n") . "\n";'
1
0

neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL. Clearly documented
and has a "D" flag to enable PCRE_DOLLAR_ENDONLY
https://secure.php.net/manual/en/reference.pcre.pattern.modifiers.php

$ php -r 'echo preg_match("/a$/D", "a\n") . "\n";'
0

ssed:

printf 'a\n\n' | ssed -Rn 'N;/a$/=;/a./!='

neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL

GNU grep:

$ printf 'a\n\0' | ltrace -e 'pcre_compile' grep -zP 'a$'
grep->pcre_compile("a$", 2080, 0x7ffcaf25aff8, 0x7ffcaf25aff4, 0x1e89280)

PCRE_DOLLAR_ENDONLY (32) but not PCRE_DOTALL

python (not PCRE)

neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL. Documented:
https://docs.python.org/3/library/re.html

\Z means the opposite from perl/PCREs! (matches at the end only)

fish (string match -r pcre strings...)

neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL

So I'd understand if you leave it as it is as many other tools
do not use PCRE_DOLLAR_ENDONLY.

I still find the idea of $ not matching only at the end of the
subject dangerous, as most people assume it does (like it does
in BRE and ERE). If not changed, it would be worth clearly
documenting (if only to flag the difference with ERE and warn of
potential implications). See how the documentation current has
this misleading example:

  [[ "$text" -pcre-match ^d+$ ]] &&
  print text variable contains only "d's".

Should be: 

  print text variable contains only "d's" optionally followed by a newline character

or:.

  [[ "$text" -pcre-match '^d+\z' ]]


It affects perl and co already. Like, many people do:

rename 's/\.back$//i' ./*

When they meant:

rename 's/\.back\z//i' ./*

Same for PCRE_DOTALL

rename 's/-.*//' ./*-*

when they meant

rename 's/(?s)-.*//' ./*-*

for instance.

-- 
Stephane



Messages sorted by: Reverse Date, Date, Thread, Author