Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: filename completion with umlauts (again)

X-seq: zsh-users 15703
From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
To: zsh-users@xxxxxxx
Subject: Re: filename completion with umlauts (again)
Date: Sat, 8 Jan 2011 20:21:22 +0000
In-reply-to: <110107231048.ZM919@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:zsh-users-help@zsh.org>
List-id: Zsh Users List <zsh-users.zsh.org>
List-post: <mailto:zsh-users@zsh.org>
Mailing-list: contact zsh-users-help@xxxxxxx; run by ezmlm
References: <20110106232712.GA11387@xxxxxxxxx> <AANLkTik9unZtuPR-4CM2oKLRT9Soct-XFWmiEajQzbK9@xxxxxxxxxxxxxx> <20110107094419.141d8d67@xxxxxxxxxxxxxxxxxxxxxxxxx> <20110107233459.GA29168@xxxxxxxxx> <110107231048.ZM919@xxxxxxxxxxxxxxxxxxxxxx>

On Fri, 07 Jan 2011 23:10:48 -0800
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> On Jan 8, 12:35am, Andy Spiegl wrote:
> }
> } Uhm, too bad.  I am wondering whether case insensitivity in the
> } matcher could be achieved with a different trick?
> 
> As I understand it, the problem isn't case insensitivity.  The problem
> is (a) representing each set of characters in a managable syntax and
> (b) efficiently constructing a mapping between the two sets.
> 
> This is a tractable problem for single byte characters because there
> is a single fixed ordering and no more than 256 values in each set; for
> multibyte characters, not only is the number of values much larger,
> but also the user-expected collating order is not always the same as
> the numeric order of the underlying encoding.
> 
> (And now I fully expect someone to point out that I've got that entirely
> wrong and the trouble really is something else.)

The remaining problem is the multibyte one; the matcher code is heavily
tied to one character per array position in a way that doesn't make it
easy to turn multibyte into wide characters and back (and that doesn't
always make it obvious what the @*!@! it's actually doing with the
array).

The collating order might be potentially a problem if you use literal
characters, but that's already fixed in a general way by allowing the
syntax:

  m:{[:upper:][:lower:]}={[:lower:][:upper:]}

and similar --- basically, any use of {...} allows matching lower and
upper characters generically.

This already works for single byte locales using future-proof library
calls (i.e. things like iswupper() that operate on wide characters);
hence I'm reasonably confident that once we fix the multibyte problem
(if ever) the rest should fall naturally into place.

-- 
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/

References:
- filename completion with umlauts (again)
  - From: Andy Spiegl
- Re: filename completion with umlauts (again)
  - From: Mikael Magnusson
- Re: filename completion with umlauts (again)
  - From: Peter Stephenson
- Re: filename completion with umlauts (again)
  - From: Andy Spiegl
- Re: filename completion with umlauts (again)
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author