Peter Stephenson wrote:
No, I ended up keeping the original behavior: On match failure, none of the special variables are modified (reset).On Wed, 25 Mar 2009 02:20:02 -0700 Jon Strait <jstrait@xxxxxxxxxxxx> wrote:A few adjustments since last time, with documentation. No reset of the special variables is done on a match failure.Er, I can't remember what Phil said (I haven't been following this in any detail), but the documentation now says variables aren't altered on a failure, so presumably that is now incorrect. I don't think this is crucial as long as it's documented correctly. Could you in any case send a documentation patch against the current source and with lines wrapped to 80 columns? Thanks. Here is the updated doc patch. Please let me know if anything I added isn't clear enough. Thanks. |
--- mod_pcre-old.yo 2009-01-15 01:49:06.000000000 -0800 +++ mod_pcre.yo 2009-03-25 03:55:58.000000000 -0700 @@ -6,7 +6,7 @@ startitem() findex(pcre_compile) -item(tt(pcre_compile) [ tt(-aimx) ] var(PCRE))( +item(tt(pcre_compile) [ tt(-aimxs) ] var(PCRE))( Compiles a perl-compatible regular expression. Option tt(-a) will force the pattern to be anchored. @@ -15,6 +15,8 @@ tt(^) and tt($) will match newlines within the pattern. Option tt(-x) will compile an extended pattern, wherein whitespace and tt(#) comments are ignored. +Option tt(-s) makes the dot metacharacter match all characters, +including those that indicate newline. ) findex(pcre_study) item(tt(pcre_study))( @@ -22,7 +24,8 @@ matching. ) findex(pcre_match) -item(tt(pcre_match) [ tt(-v) var(var) ] [ tt(-a) var(arr) ] var(string))( +item(tt(pcre_match) [ tt(-v) var(var) ] [ tt(-a) var(arr) ] \ +[ tt(-n) var(offset) ] [ tt(-b) ] var(string))( Returns successfully if tt(string) matches the previously-compiled PCRE. @@ -33,8 +36,38 @@ case it will set the array var(arr). Similarly, the variable var(MATCH) will be set to the entire matched portion of the string, unless the tt(-v) option is given, in which case the variable -var(var) will be set. -No variables are altered if there is no successful match. +var(var) will be set. +No variables are altered if there is no successful match. +A tt(-n) option starts searching for a match from the +byte var(offset) position in var(string). If the tt(-b) option is given, +the variable var(ZPCRE_OP) will be set to an offset pair string, +representing the byte offset positions of the entire matched portion +within the var(string). For example, a var(ZPCRE_OP) set to "32 45" indicates +that the matched portion began on byte offset 32 and ended on byte offset 44. +Here, byte offset position 45 is the position directly after the matched +portion. Keep in mind that the byte position isn't necessarily the same +as the character position when UTF-8 characters are involved. +Consequently, the byte offset positions are only to be relied on in the +context of using them for subsequent searches on var(string), using an offset +position as an argument to the tt(-n) option. This is mostly +used to implement the "find all non-overlapping matches" functionality. + +A simple example of "find all non-overlapping matches": + +example( +string="The following zip codes: 78884 90210 99513" +pcre_compile -m "\d{5}" +accum=() +pcre_match -b -- $string +while [[ $? -eq 0 ]] do + b=($=ZPCRE_OP) + accum+=$MATCH + pcre_match -b -n $b[2] -- $string +done +print -l $accum + + +) ) enditem()