Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Surprising behaviour with numeric glob sort
- X-seq: zsh-workers 41208
- From: Stephane Chazelas <stephane.chazelas@xxxxxxxxx>
- To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
- Subject: Re: Surprising behaviour with numeric glob sort
- Date: Sat, 3 Jun 2017 22:16:46 +0100
- Cc: Zsh hackers list <zsh-workers@xxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Z23e6AgIpeZCNa1CA5ApkePxF3ebF887db0ISb3gAA4=; b=jeJMJ+beER4zNU6FbpSom2G4waOa+CS+P+x7PBu36Rq+Bf0pa12DAilYPv4H1z7H/7 9VTgbdwTK7G5XoK3ilhB9IpqDxvVFNXQ8LH2hoVmNYKvurlssVnLgk79Zp8f8VLGp1Ul WQqTtyAjlRCzf0v4lzyMx7KypvfZ3+JyMM2mrQOTd3y9tV/07ydgrjnMS6PsCoKm4boU ht7Yn3owmiQ+DjK62yTgR175vrCoy4Idlf98G+paicmKomj7lFB0ctYQGEBqx2G4mueO HoNzYd7BefboXmaW9tLaPrb9Vo5HveDMPnyBeCdWSF0lfR62vH6ASHKAM3jr9cEsPnwZ JL4Q==
- In-reply-to: <170602161905.ZM10488@torch.brasslantern.com>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- Mail-followup-to: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>, Zsh hackers list <zsh-workers@xxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <20170531212453.GA31563@chaz.gmail.com> <170601152943.ZM4783@torch.brasslantern.com> <20170602090332.GA6574@chaz.gmail.com> <170602161905.ZM10488@torch.brasslantern.com>
2017-06-02 16:19:05 -0700, Bart Schaefer:
> On Jun 2, 10:03am, Stephane Chazelas wrote:
> }
> } $ echo *(n)
> } zsh-10 zsh2 zsh10 zsh-3
> }
> } (here in my en_GB.UTF-8 GNU locale)
> }
> } is unexpected/broken. "zsh" sorts before "zsh-" in my locale, so
> } I'd expect the zsh2, zsh10 to come before zsh-3, zsh-10 which is
> } the basis of my proposal. In any case, zsh-3 should come before
> } zsh-10, nobody can argue against that.
>
> Well, one could argue that "-10" should be treated as negative ten
> and therefore should sort before negative three, but I'm not sure
> we want to get into that.
The (my at least) main usage for *(n) is to sort version numbers
like zsh-3.0, zsh-3.1, zsh-4. So handling negative numbers
wouldn't help in those cases.
[...]
> That is, "zsh-3" is never
> compared numerically to "zsh2" because "zsh2" and "zsh-" are
> considered already to differ.
[...]
> So I think what you propose is that when "zsh1" is found to have a
> difference with "zsh-", the algorithm should look forward across
> "zsh-" to find "3" and at that point end up comparing "10" to "3"?
> That would lead to the order in your example becoming
> zsh2 zsh-3 zsh10 zsh-10.
[...]
No, what I propose is very simple.
When comparing "zsh-3" with "zsh2", we compare the non-numeric
prefix: "zsh-" and "zsh". And already, at that point, "zsh" is
less than "zsh-", so we stop here (zsh2 < zsh-3)
If it was
zsh-3.1 vs zsh-3
["zsh-", 3, ".", 1] vs ["zsh-", 3]
- strcoll(zsh-, zsh-) => 0
- 3 == 3
- strcoll(".", "") => zsh-3 < zsh-3.1
Now there are some aspects of the current implementation that
one might find useful like:
$ echo *
a a-3.1 a-3+1 a-3.2 a-3+2
$ (LC_ALL=C; echo *)
a a-3+1 a-3+2 a-3.1 a-3.2
$ echo *(n)
a a-3.1 a-3+1 a-3.2 a-3+2
$ (LC_ALL=C; echo *(n))
a a-3+1 a-3+2 a-3.1 a-3.2
The fact that those "-" and "." are ignored in the first
strcoll() pass in some locales makes it for a more "numerical"
sort. Though again, it's easily broken with:
$ touch a-3.10
$ echo *(n)
a a-3.1 a-3+1 a-3.2 a-3.10 a-3+2
Ideally, we'd want to hook into the strcoll() algorithm to
introduce the numerical comparisons in there. Maybe that can be
done using zero-padding like for the above, just do a strcoll()
comparison after transformation (a sort of pre-strxfrm()) of the
strings from:
a a-3.1 a-3+1 a-3.2 a-3.10 a-3+2
to:
a
a-03.01
a-03.01
a-03+01
a-03.02
a-03.10
a-03+02
adjusting the length of the padding as needed.
The above would sort to
a
a-03.01
a-03.01
a-03+01
a-03.02
a-03+02
a-03.10
In my GNU British locale and
a
a-03+01
a-03+02
a-03.01
a-03.01
a-03.02
a-03.10
In the C locale.
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author