Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Strange substring search behaviour



Peter Stephenson wrote:
> In fact, the internals are pretty much all there to be able to replace
> the shortest match instead of the longest match for the pattern.  The
> only thing missing is the syntax.

I decided on a syntax:  S for shortest substring; the substring flag
is not used for substitutions otherwise.

However, I discovered an ambiguity I wasn't aware of.  The form
${(S)foo#bar} is supposed to find substrings in $foo, using the
shortest match (## would give the longest match).  But (the M flag
means print the portion actually matched rather than the string with
that deleted, it doesn't affect what actually matches):

% foo="twinkle twinkle little star"
% print ${(M)foo#t*e}                # shortest match of t*e at head
twinkle                              # so far so good
% print ${(MS)foo#t*e}               # same but look for substrings
tle

This suprised me.  I would have expected it to start from the head,
and look for the shortest string that matches there, and carry on down
the string looking for the shortest match from any position.  Instead
it looks for the shortest *possible* match *anywhere*.  Maybe I should
have guessed?  It makes it difficult for shortest-match substitution,
since that has to start from the beginning and go down the string
(i.e., I wanted ${(S)foo//t*e/spy} to print `spy spy lispy star' and
this posting came about because it didn't).

Furthermore, this makes it a little strange when used with the I.n. flag,
which tells you to use the n'th match.

% print ${(MSI.1.)foo#t*e} 
tle                                  # first match: shortest
% print ${(MSI.2.)foo#t*e} 
ttle                                 # second match: second shortest
% print ${(MSI.3.)foo#t*e} 
twinkle                              # first occurrence of third shortest
% print ${(MSI.4.)foo#t*e} 
twinkle                              # the other twinkle
% print ${(MSI.5.)foo#t*e} 
twinkle little                       # all rather interesting...
% print ${(MSI.6.)foo#t*3} 
twinkle twinkle                      # ...in its own way...
% print ${(MSI.7.)foo#t*e} 
twinkle twinkle little               # ...but is it right?
                                     # (in fact, that's the *longest* match).

I would have expected `twinkle', `twinkle', `ttle' and `tle' (the last
has already gone by then if you're doing a global substitution so
doesn't get replaced), i.e. the shortest matches from each position in
order of finding.

I'd quite like to rewrite the whole thing the way my original
inclinations told me.  Any comments?  In other words, does anyone
think they or anyone else is expecting to find the globally shortest
match first?  Should I ask for a vote on zsh-users?

-- 
Peter Stephenson <pws@xxxxxxxxxxxxxxxxx>       Tel: +39 050 844536
WWW:  http://www.ifh.de/~pws/
Dipartimento di Fisica, Via Buonarroti 2, 56127 Pisa, Italy



Messages sorted by: Reverse Date, Date, Thread, Author