Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Several PCRE module oddities



Replying to all of these PCRE-related items all together ...

Preliminary remark:  In none of your test scripts do I see you setting
the RE_MATCH_PCRE option.  You need to set that, in addition to loading
the zsh/pcre module, or [[ str =~ pat ]] continues to use zsh/regex.  I
think that explains your "zsh -f" behavior confusion (see below).

On Jul 19,  2:19pm, Moritz Bunkus wrote:
}
} today I stumbled across this paragraph in zsh's info documentation
} again. Citing "Description of Options":
} 
} > CASE_MATCH <D>
} >      Make regular expressions using the zsh/regex module (including
} >      matches with =~) sensitive to case.
} 
} It does not apply to =~ if the zsh/pcre module is loaded.

See the first and third hunks of the patch below, though I suppose we
should get general agreement on whether it should work this way, because
there's no way to turn it off on a per-pattern basis (unlike turning it
*on* with "pcre_compile -i").

Also of course if you pcre_compile with one setting of CASE_MATCH and then
change it before calling pcre_match, you get the behavior from compile
time, so that ought to be explicitly documented.  That doesn't apply to
the inline condition operator, which recompiles every time it's used.


On Jul 19,  2:31pm, Moritz Bunkus wrote:
}
} line=
} if [[ $line =~ '^$' ]] print is empty case 1
} 
} pcre_compile '^$'
} pcre_match "$line" && print is empty case 2
} pcre_match $line   && print is empty case 3
} ----------------------------------------
} 
} 1. =~ matches as expected
} 
} 2. pcre_match "$line" does NOT match and doesn't emit an error message
} 
} 3. pcre_match $line does NOT match either and emits an error message
} 
} This is not only inconsistent but also simply wrong. Both 2. and
} 3. should match, and 2. shouldn't emit an error message.

Actually only (2) is strange here, see second hunk of patch below (there
may be a better way to fix this).  As for examples (1) and (3):

[[ $line =~ ^$ ]] is a special syntactic construct which treats $line
(unquoted parameter reference) as a token before expanding the value.
You also don't need the single-quotes around ^$ for this reason.

The calls to pcre_match, on the other hand, are normal shell commands,
which means the parameter references are expanded and unquoted values
are completely removed from the argument list, before the command is
even invoked.  So "not enough arguments" is exactly as expected, and
completely consistent with other shell commands.


On Jul 19,  2:36pm, Moritz Bunkus wrote:
}
} This case gets even weirder. The previous output I posted was gathered
} from zsh running with my normal RC files. The output actually differs if
} run with -f:
} 
} Meaning without any RCs case 2 matches, too!

This is because you haven't set the RE_MATCH_PCRE option, so zsh/regex
is being used for case 2.


Here's the patch.

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index cb9f8ef..2333438 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -87,6 +87,8 @@ bin_pcre_compile(char *nam, char **args, Options ops, UNUSED(int func))
     
     if (zpcre_utf8_enabled())
 	pcre_opts |= PCRE_UTF8;
+    if (!isset(CASEMATCH))
+	pcre_opts |= PCRE_CASELESS;
 
     pcre_hints = NULL;  /* Is this necessary? */
     
@@ -311,7 +313,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     unmetafy(plaintext, NULL);
     subject_len = (int)strlen(plaintext);
 
-    if (offset_start < 0 || offset_start >= subject_len)
+    if (offset_start < 0 ||
+	(subject_len ? offset_start >= subject_len : offset_start > 0))
 	ret = PCRE_ERROR_NOMATCH;
     else
 	ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize);
@@ -345,6 +348,8 @@ cond_pcre_match(char **a, int id)
 
     if (zpcre_utf8_enabled())
 	pcre_opts |= PCRE_UTF8;
+    if (!isset(CASEMATCH))
+	pcre_opts |= PCRE_CASELESS;
 
     lhstr = cond_str(a,0,0);
     rhre = cond_str(a,1,0);



Messages sorted by: Reverse Date, Date, Thread, Author