Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Surprising behaviour with numeric glob sort



2017-06-05 12:54:39 +0100, Stephane Chazelas:
[...]
> Having NUL sort before any other character would be preferable
> in the C locale that sorts by code point though (like to sort
> UTF-16 text based on codepoint).
[...]

Well, only for UTF-16BE. Given that UTF-16 is mostly used only
on Microsoft and as little endian, that would rarely be useful.

Still, it would be more consistent to have 0 < 1 < ... < 255
and would make sure the order is deterministic which is generally
expected of the C locale. It should be only a matter of calling
memcmp() when we detect the locale (LC_COLLATE) to be C or
POSIX.

GNU sort seems to be treating the NUL byte as the character that
sorts first and seems to be doing it using several calls to
strcoll() in non-C locales:

$ printf '%b\n' 'X\0\0A\0B' 'X\0\0A\0\0C' | ltrace -e strcoll sort
sort->strcoll("X", "X")        = 0
sort->strcoll("", "")          = 0
sort->strcoll("A", "A")        = 0
sort->strcoll("B", "")         = 1
XAC
XAB

(I'd say there's scope for optimisation there).

In the C locale, it just calls memcmp().

Cheers,
Stephane



Messages sorted by: Reverse Date, Date, Thread, Author