Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Improving arrayuniq

X-seq: zsh-workers 30383
From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
To: zsh-workers@xxxxxxx
Subject: Improving arrayuniq
Date: Wed, 04 Apr 2012 01:08:07 -0700
In-reply-to: <120404005237.ZM10249@torch.brasslantern.com>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
References: <CABZhJg-xieMZAOEmbYzSb5T-3O8aLNurvPk=SiXsPGjTpAMg-Q@mail.gmail.com> <120404005237.ZM10249@torch.brasslantern.com>

} Anybody want to have a stab at creating a vastly more efficient version
} of Src/params.c:arrayuniq() ?  I came up with a faster way to remove the
} duplicate elements, but the problem here is to efficiently determine
} that there are no duplicates to remove.  Perhaps throw the elements into
} a hash table and then make one pass looking up each element.

Here's the improved way to collapse out the duplicates (one pass instead
of shifting the whole remainder of the array each time a duplicate is
found).  I haven't changed the basic find-duplicates algorithm, which is
optimized for space rather than speed.


/**/
static void
arrayuniq(char **x, int freeok)
{
    char **t, **p = x;
    char *hole = "";

    /* Find duplicates and replace them with holes */
    while (*++p)
	for (t = x; t < p; t++)
	    if (*t != hole && !strcmp(*p, *t)) {
		if (freeok)
		    zsfree(*p);
		*p = hole;
		break;
	    }
    /* Swap non-holes into holes in optimal jumps */
    for (p = t = x; *t != NULL; t++) {
	if (*t == hole) {
	    while (*p == hole)
		++p;
	    if ((*t = *p) != NULL)
		*p++ = hole;
	} else if (p == t)
	    p++;
    }
    /* Erase all the remaining holes, just in case */
    while (++t < p)
	*t = NULL;
}

Messages sorted by: Reverse Date, Date, Thread, Author