Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Surprising behaviour with numeric glob sort
2017-06-05 12:54:39 +0100, Stephane Chazelas:
[...]
> Having NUL sort before any other character would be preferable
> in the C locale that sorts by code point though (like to sort
> UTF-16 text based on codepoint).
[...]
Well, only for UTF-16BE. Given that UTF-16 is mostly used only
on Microsoft and as little endian, that would rarely be useful.
Still, it would be more consistent to have 0 < 1 < ... < 255
and would make sure the order is deterministic which is generally
expected of the C locale. It should be only a matter of calling
memcmp() when we detect the locale (LC_COLLATE) to be C or
POSIX.
GNU sort seems to be treating the NUL byte as the character that
sorts first and seems to be doing it using several calls to
strcoll() in non-C locales:
$ printf '%b\n' 'X\0\0A\0B' 'X\0\0A\0\0C' | ltrace -e strcoll sort
sort->strcoll("X", "X") = 0
sort->strcoll("", "") = 0
sort->strcoll("A", "A") = 0
sort->strcoll("B", "") = 1
XAC
XAB
(I'd say there's scope for optimisation there).
In the C locale, it just calls memcmp().
Cheers,
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author