Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: [Bug] S-flag imposes non-greedy match where it shouldn't



Sebastian Gniazdowski wrote on Thu, Dec 26, 2019 at 19:35:05 +0100:
> I've attached the extended description.

Thanks; review below.

> It includes a trick to
> work-around the unintuitive behavior of S. It looks as follows:
> 
> http://psprint.blinkenshell.org/S_flag.png

Please just copy-paste terminal transcripts into the email.  (Static
transcripts, as in the documentation.)

> I think that the way the S flag works is a bit of an inconsistency,
> Because ${str%%X##**} would not stop at the first from the right
> match, it would try other matches starting from the right and go on up
> to the final first from the left X. I think that (S) shouldn't change
> this, but on the other hand should ${(S)str%%X##} match the first
> three X? Rather not, as it would resemble ## then... Intuitively,
> however, it should match all the three right X.

Yes, I don't find the following very intuitive:

% set -- aXbXc
% p ${1%%X*}
a
% p ${(S)1%%X*}
aXb
% p ${(S)1%X*}
aXbc
% 

I expected ${(S)%%} to mean: 'Look for the longest match that ends on the last
character; if you don't find any, then look for the longest match that ends on
the penultimate character; etc, until you finally consider whether $str[1] is a
match and whether ${str[1,0]} is a match'.  However, that's clearly not what it
does here, or ${(S)1%%X*} would have printed «a».  Rather, it seems that
${(S)%} and ${(S)%%} mean 'Find the match whose _start_ is closest to the end
of the string; of all matches that start at a particular index, ${(S)%} picks
the shortest and ${(S)%%} the longest.'.

> +++ b/Doc/Zsh/expn.yo
> @@ -1399,6 +1399,20 @@ from the beginning and with tt(%) start from the end of the string.
>  With substitution via tt(${)...tt(/)...tt(}) or
>  tt(${)...tt(//)...tt(}), specifies non-greedy matching, i.e. that the
>  shortest instead of the longest match should be replaced.
> +The substring search means that the pattern is matched skipping the
> +parts of the input string starting from the direction set by the use
> +of tt(#) or tt(%).

I don't understand this sentence.  What does "skipping" mean?

Documentation should be clear and specific enough to allow acceptance
tests to be based on it.

> +For example, to match a pattern starting from the
> +end, one could use:
> +
> +example(str="abcXXXdefXXXghi"
> +out=${(S)str%%(#b)([^X])X##}
> +out=$out${match[1]}
> +)
> +
> +The result is tt(abcXXXdefghi).

That's not correct.  The output is abcXXXdefXXXghi (in 'zsh -f') or
abcXXXdeghif (with extendedglob set), but not abcXXXdefghi.

I doubt this example would clarify the meaning of ${(S)} to people who
encounter it for the first time.  Please use a more minimal example.
Specific issues:

- Assigning to $out a concatenation of two different values muddies the water.
  It forces readers to reverse engineer which parts of the resultant value come
  from ${match[1]} and which from the ${(S)%%}.  This is documentation, not
  a homework problem; the answer should be obvious.  Something like
  «out="${out}+${match[1]}"» would address this — but…

- … the use of advanced pattern matching features needlessly raises the
  learning curve.  For example, the use of «##» doesn't affect the behaviour
  of the example in any meaningful way, but it has two downsides: it means the
  example won't work out of the box when people paste it into their shell, and
  it means people who RTFM about (S) won't be able to understand it until they
  also look up what «##» does [which in turn means they'll have to open
  zshoptions(1) to RTFM about EXTENDED_GLOB].  This mostly applies to the use
  of (#b) and capture groups too: it would be better not to assume knowledge
  of that.

> It would have been tt(abcXXXdefXXghif)
> +if not the tt([^X]) part, as despite the tt(%%) specifies a greedy
> +match, the substring matching works by trying matches from right to
> +left and stops at a first valid match.

There are some grammatical errors here (e.g., s/(?<=specif)ies/ying/), but
let's not worry about them until the rest of the patch isn't a moving target.

Thanks for the patch.  I look forward to a v2.

Daniel

P.S. Obviously, I meant to write «s/specifies/specifying/» — but I wanted to
illustrate the point that no more knowledge of pattern-matching syntax should
be assumed than necessary.  [It was a positive lookbehind assertion Perl
syntax.]



Messages sorted by: Reverse Date, Date, Thread, Author