Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: filename completion with umlauts (again)
- X-seq: zsh-users 15703
- From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
- To: zsh-users@xxxxxxx
- Subject: Re: filename completion with umlauts (again)
- Date: Sat, 8 Jan 2011 20:21:22 +0000
- In-reply-to: <110107231048.ZM919@xxxxxxxxxxxxxxxxxxxxxx>
- List-help: <mailto:zsh-users-help@zsh.org>
- List-id: Zsh Users List <zsh-users.zsh.org>
- List-post: <mailto:zsh-users@zsh.org>
- Mailing-list: contact zsh-users-help@xxxxxxx; run by ezmlm
- References: <20110106232712.GA11387@xxxxxxxxx> <AANLkTik9unZtuPR-4CM2oKLRT9Soct-XFWmiEajQzbK9@xxxxxxxxxxxxxx> <20110107094419.141d8d67@xxxxxxxxxxxxxxxxxxxxxxxxx> <20110107233459.GA29168@xxxxxxxxx> <110107231048.ZM919@xxxxxxxxxxxxxxxxxxxxxx>
On Fri, 07 Jan 2011 23:10:48 -0800
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> On Jan 8, 12:35am, Andy Spiegl wrote:
> }
> } Uhm, too bad. I am wondering whether case insensitivity in the
> } matcher could be achieved with a different trick?
>
> As I understand it, the problem isn't case insensitivity. The problem
> is (a) representing each set of characters in a managable syntax and
> (b) efficiently constructing a mapping between the two sets.
>
> This is a tractable problem for single byte characters because there
> is a single fixed ordering and no more than 256 values in each set; for
> multibyte characters, not only is the number of values much larger,
> but also the user-expected collating order is not always the same as
> the numeric order of the underlying encoding.
>
> (And now I fully expect someone to point out that I've got that entirely
> wrong and the trouble really is something else.)
The remaining problem is the multibyte one; the matcher code is heavily
tied to one character per array position in a way that doesn't make it
easy to turn multibyte into wide characters and back (and that doesn't
always make it obvious what the @*!@! it's actually doing with the
array).
The collating order might be potentially a problem if you use literal
characters, but that's already fixed in a general way by allowing the
syntax:
m:{[:upper:][:lower:]}={[:lower:][:upper:]}
and similar --- basically, any use of {...} allows matching lower and
upper characters generically.
This already works for single byte locales using future-proof library
calls (i.e. things like iswupper() that operate on wide characters);
hence I'm reasonably confident that once we fix the multibyte problem
(if ever) the rest should fall naturally into place.
--
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/
Messages sorted by:
Reverse Date,
Date,
Thread,
Author