Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: Substitution ${...///} slows down when certain UTF character occurs

X-seq: zsh-workers 36654
From: Sebastian Gniazdowski <sgniazdowski@xxxxxxxxx>
To: zsh-workers@xxxxxxx
Subject: Re: Substitution ${...///} slows down when certain UTF character occurs
Date: Sun, 27 Sep 2015 10:13:11 +0200
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=3a5Cg9t4VB0nXwWHBXkW3hfZGrpK0fsTWwAd9PbJfl4=; b=wmWs+2l7f3IQKgTBDO+NLbaFNPHJSLE9MqSkkmDzg4DAuZqsw27J/D8acrY025DDPQ UYw+3SE0HhA65tEYmmwc5tZy1ARgX5/d1tyW5AkHm6KcwA4HM094VJDa0L5TQM2/ia4k L2vboOMvhZhZfmAG2PtiwDVjSdXBRxDdfyMGmg4fKbxJXrJzPMuzI2Yze+g0faRa6ZBw UPi3BOmWnl4TifNwbTUzB6nHsY8ZQq6u6gvYpGP9tDHu0Y97xbQ9smsjH0fle3bvhU6a JlAMhiYxNP6csPcSjL0pzOfljQWXdbxVTwOmBiloxoFH/AvO7YtjjxgLvPsUln/Fouqo OY7w==
In-reply-to: <150926134410.ZM17546@torch.brasslantern.com>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
References: <CAKc7PVBuCuLux8WhBORuYo_vQUQ18OP-XMQXWdYkL84pFLt+LQ@mail.gmail.com> <150926134410.ZM17546@torch.brasslantern.com>

On 26 September 2015 at 22:44, Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> On Sep 26,  2:19pm, Sebastian Gniazdowski wrote:
> }
> } I attach a script that does ${...///} substitution.
>
> I worry that the attachement hasn't come through correctly?  When I
> unpack the base64 into text, I get (in part)
>
> str="c4d5148ca6 ce3a2d24203abfb385 30f5fe85434ae ... 5d468f6"
>
> Is the value of $str supposed to look like that?  So the pattern in
> the ${str//...} replacement never matches?

Yes. I attached the string instead of code that generated it:
# cat /dev/urandom | env LC_CTYPE=C tr -cd 'a-f0-9 ' | head -c 120000

> } It  is very slow for some chars and very fast for others. How to explain
> } and hopefully fix this?
>
> Each time pattryrefs() fails to find a match, it increments the area
> to be searched by one character and then tries the entire pattern
> match again.  So for a 120000-character string, it's doing a non-
> matching search 120000 times.

That's a huge plus that it's still instant fast for strings of that
length if there is no unlucky unicode character.

> I rewrote your test to use "float SECONDS" + "print $SECONDS" instead
> of forking off subshells for "time" and to use loops so I didn't have
> to comment things in and out.  Observations:
>
> 1. It's only fast for the Yen symbol, which is the only one that does
> not have a byte with the high-order bit set.  This case is avoiding
> this block in pattern.c:

For me (OSX / zsh 5.0.2) it was fast for characters at even positions
in what I attached, i.e. for chars ¥,Ł,Ǟ. Didn't thought it can differ
for different environments, I now ran the test on different machines.
Ubuntu 12.10 / zsh 5.0.0 is the same. For FreeBSD / zsh 5.1.1-dev-0
(HEAD 50721a1 and 8d5c0c) it's different, fast characters are: ¥, Ł.
For zsh-5.1.1-dev-0 (HEAD 50721a1 and 8d5c0c) on OSX it's the same as
the FreeBSD case.

Best regards,
Sebastian Gniazdowski

Follow-Ups:
- Re: Substitution ${...///} slows down when certain UTF character occurs
  - From: Bart Schaefer

References:
- Substitution ${...///} slows down when certain UTF character occurs
  - From: Sebastian Gniazdowski
- Re: Substitution ${...///} slows down when certain UTF character occurs
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author