Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: PATCH: multibyte characters in patterns.



Vincent Lefevre wrote:
> Could you give examples of what it does exactly?
> Do you mean that "?" can now match a multibyte character?

Yes.  If X is a multibyte character consisting of two bytes (say a with
a grave accent) in the current locale, the following are both true:

[[ X = (#U)?? ]]
[[ X = (#u)? ]]

> Will it also match a UTF-8 character while being in ISO-8859-1 locales?
> (The reason could be to be able to handle data that use another encoding
> than the locales, mainly when data are shared amongst different users
> who use different locales, in which case these data are encoded in UTF-8
> in general.)

You should be able to do this by locally altering the locale, since the
various variables (LANG, LC_*) are special in zsh and will perform the
appropriate setlocale() calls---as long as the system library supports
the locale, obviously.  Making the variable local should be good enough
since specials are set and restored with the correct function calls.
However, I haven't tried this.  (This ability is already present---the
only relevant thing I've changed is that patterns will obey the locale.)

> How about that in UTF-8 locales?
> 
> dixsept:~> foo="bàr"
> dixsept:~> echo $foo[2]

I haven't done anything with parameters yet, so that currently operates
on bytes, but this will be fixed eventually.  The MULTIBYTE option will
apply and we'll presumably need parameter flags equivalent to the
globbing flags; unfortunately this time even (u) and (U) are taken.

> Couldn't an "unused" area of Unicode be used for arbitrary bytes?

I suppose that's possible, but it's not actually guaranteed (and we
don't require) that a wchar_t is actually a Unicode character at all; if
I've finally understood the __STDC_ISO_10646__ stuff there seems to be
quite a lot of systems like this.

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070



Messages sorted by: Reverse Date, Date, Thread, Author