Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Substitution ${...///} slows down when certain UTF character occurs



On Mon, 28 Sep 2015 09:51:42 +0100
Peter Stephenson <p.stephenson@xxxxxxxxxxx> wrote:
> I think a reasonable strategy would be to change the call sequence for
> pattrylen() and pattryrefs(), which are the key ones, to pass in an
> optional unmatefied string; some of the remaining calls in glob.c could
> be premoted to pattrylen which is a strict superset of pattry.  That
> would leave pattry() untouched for the majority of cases doing one-off
> matching.
> 
> Ideally we only want to pass in either a metafied or unmetafied string.
> I don't know off the top of my head how much work it is to fix up the
> PAT_PURES optimisation where we've got an already unmetafied string but
> it shouldn't be too much.

The problem here is we're comparing against a string compiled into the
pattern which is metafied and now we have an unmetafied trial string.
So we can't do a direct comparison any more without some extra work.

1. Give up on the optimisation when we have an unmetafied string.  That
is, we'll still be comparing characters, but in the bowels of the
pattern code --- we won't optimise to a strcmp().  This seems a bad
thing to do when the whole point of the change is as an optimisation.

2. Use a partial optimisation by unmetafying the pattern string on the
fly.  So we're not using memcmp any more, but we'll have a tight loop
over characters and this can be done with local code at the point where
we currently do the memcmp().

3. Compile both metafied and unmetafied variants into the pattern.  This
is wasteful.

4. Have both metafied and unmetafied variants for the pattern when using
a pure string, but only produce, and cache, the unmetafied version when
needed for comparison.  This is more effective than caching the trial
string because the pattern is only compiled once for many uses of it ---
we only lose out here if somebody is looping over a pattern (not just a
trial string as in the glob code) many times i.e. either redoing
patcompile() or using a pre-compiled pattern, and the latter isn't all that
common in the code (I'm not sure where it does happen if it does).
This seems to push the inefficiency out of inner loops to a frequency
where it's probably not a noticeable factor any more.

5. Deal with both metafied and unmetafied strings in the calling code.
This is a messy last resort.

I think both 2. and 4. look promising.

pws



Messages sorted by: Reverse Date, Date, Thread, Author