Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Idea for optimization (use case: iterate string with index parameter)
- X-seq: zsh-workers 42232
- From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
- To: "zsh-workers@xxxxxxx" <zsh-workers@xxxxxxx>
- Subject: Re: Idea for optimization (use case: iterate string with index parameter)
- Date: Fri, 5 Jan 2018 14:23:57 -0800
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brasslantern-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=mH1LfW/QwlL+JGBBWwABnSTeL+PZ/ZuQncXu2g1bF0k=; b=mbhv4la/Xz/Xhy+pRjUL2c/D0rqt6IiTbaG6zG8XU/55LFj70uW97ZB6Xtmy+yFq0G 9FcO2HKIQif/ECmuEzbKROZK8uXvdwfCxtdQj4VqF58jXqusSaJmJ8H4ITVZHftehwot wfkjzwem21U0mpIHZOSSVMGsjG5iOsZ/j7X8QDGcz6H9BVUjjCXGxD5Dxpf1FGQd+1Mk uldpUNpYm2Lr2P26mS9+641nOEge/JGO63ROS4hck2wLjkyT27S1Ib8Z3eXP8/U0BkM9 kctsYPO3aSLPnUBaFSNVWcccobDRHibXvCFcBWTT4H/nOrTRK5dGXU0iS5OA3nsxOY4N vfcA==
- In-reply-to: <etPan.5a4f7fdd.52e15119.14e5a@zdharma.org>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <etPan.5a4f7fdd.52e15119.14e5a@zdharma.org>
On Fri, Jan 5, 2018 at 5:38 AM, Sebastian Gniazdowski
<psprint@xxxxxxxxxxx> wrote:
> iterating string with index parameter is quite slow, because unicode characters are skipped and counted using mbrtowc().
I can't remember the last time I needed to do that kind of iteration.
> For example, I saw z-sy-h uses such loops, my projects sometimes use them too. The point is that iterating a string and doing something with letters, e.g. counting brackets, is a very common use case, and the optimization would be triggered often.
Hmm. Whether this is worthwhile depends on the size of the typical
processed string. I can see this affecting z-sy-h when e.g. running
zed on a big function, but probably not when editing a typical command
line.
Maybe it would be reasonable to do something in shell code, e.g.:
typeset -a iter=(${(s//)string})
for ((i=1; i <= $#iter; i++)); do something with $iter[i]; done
string=${(j//)iter} # if needed
That is more memory-intensive, of course, but it also assists with
cases of unordered access into the array of characters.
> In general, the array would hold #N (5-10 or so) last string-index requests. If new request would target the same string, but index greater by 1, getarg() would call mbrtowc() once (via MB_METACHARLEN macro) reusing the previous in-string pointer.
Why only when greater by 1? If greater, scan to and record the next
needed position. Same number of mbrtowc() conversions, overall.
Messages sorted by:
Reverse Date,
Date,
Thread,
Author