Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: filename completion with umlauts (again)
- X-seq: zsh-workers 28602
- From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
- To: zsh-workers@xxxxxxx
- Subject: Re: filename completion with umlauts (again)
- Date: Sat, 8 Jan 2011 23:21:40 +0000
- In-reply-to: <110108142301.ZM2102@xxxxxxxxxxxxxxxxxxxxxx>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <20110106232712.GA11387@xxxxxxxxx> <AANLkTik9unZtuPR-4CM2oKLRT9Soct-XFWmiEajQzbK9@xxxxxxxxxxxxxx> <20110107094419.141d8d67@xxxxxxxxxxxxxxxxxxxxxxxxx> <20110107233459.GA29168@xxxxxxxxx> <110107231048.ZM919@xxxxxxxxxxxxxxxxxxxxxx> <20110108202122.5decaa0b@xxxxxxxxxxxxxxxxxxx> <110108142301.ZM2102@xxxxxxxxxxxxxxxxxxxxxx>
On Sat, 08 Jan 2011 14:22:59 -0800
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> } The collating order might be potentially a problem if you use literal
> } characters, but that's already fixed in a general way by allowing the
> } syntax:
> }
> } m:{[:upper:][:lower:]}={[:lower:][:upper:]}
>
> The syntax is supported but the handling doesn't appear to be special-
> cased; mb_patmatchindex() does not differ from patchmatchindex() in its
> handling of PP_UPPER or PP_LOWER and assumes ranges are numerically
> contiguous.
The relevant code is in Src/Zle/compmatch.c. (There are some references
to matchers in other parts of the completion code, and there's a little
bit of extra help from the regular expression code but that's fairly
trivial.) Equivalence classes are handled by
pattern_match_equivalence(). In every other place equivalence classes
are treated identically to normal character classes.
> What is it that I continue to fail to see?
See any number of while loops over character arrays in compmatch.c; as
one example, the loop at line 529 in match_str(). The various arrays
are simply char *'s and they're not even metafied (if I remember right;
that's how we support 8-bit single byte encodings, by direct
comparison). The place is full of expressions like "w + aoff - aol" and
"l[-(llen + zoff)]". All these arrays need to refer either to multibyte
characters with appropriate arithmetic using mbsrtowcs() and friends, or
need to be converted to wide characters and back at appropriate points,
and in the latter case we need to convert everything relevant into wide
characters and back again, in some cases potentially losing information
since not everything on the command line is guaranteed to be a multibyte
string corresponding to a valid character in the current locale. (For
example, you can complete a file name containing ISO-8859-1 characters
even when the locale is UTF-8; this should work even though the
characters don't show up properly.)
If you *can* prove it's trivial, of course...
--
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/
Messages sorted by:
Reverse Date,
Date,
Thread,
Author