Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: UTF-8 input [was Re: PATCH: zle_params.c]
- X-seq: zsh-workers 20757
- From: Peter Stephenson <pws@xxxxxxxxxxxxxxxxxxxxxxxx>
- To: Zsh hackers list <zsh-workers@xxxxxxxxxx>
- Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c]
- Date: Sun, 30 Jan 2005 01:07:53 +0000
- In-reply-to: <20050129034740.GA21742@xxxxxxxxxxx>
- Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
- References: <200501261806.j0QI6Q2d021854@xxxxxxxxxxxxxx> <20050129034740.GA21742@xxxxxxxxxxx>
Clint Adams wrote:
> > I've left last_isearch since it's not clear what is to become of it
> > yet. Fixing doisearch isn't going to be great fun (240 lines, 2
> > comments). It'll have to wait until we decide about input.
>
> What needs deciding?
At what stage we turn a character from read() into a wide character.
I argued before that key bindings should still use ordinary character
strings to avoid breaking existing bindings. Somewhere before we insert
a character in the line we need to accumulate bytes from multibyte
characters where necessary.
I thought of the following: self-insert could take a single character,
as at present, and then test if it was the initial part of a multibyte
character. If it was, it could read the rest; we might need a timeout to
avoid an infinite hang on systems that didn't do multibyte input
properly, which is potentially quite a lot of them. This would allow
you to bind all 8-bit characters with the top bit set to self-insert and
voila, multibyte character input with the property (as in UTF-8) that
the 7-bit subset is ASCII is now completely handled, but with the
choice of whether to do so or keep old 8-bit bindings left to users.
This leaves other calls to getkey() and other low-level key handling
routines. Some might need the same mechanism; isearch is an example,
because some keys are interpreted while some are inserted into the
search string. A further complication is that when searching the
history we might well want to keep the history lines as multibyte
strings; then the search string remains in that format, too. As this
example indicates I think each case will need considering on its merits.
In addition to getkey() and friends, there is the related matter of the
variable lastchar. Currently this is a single character; I'm not yet
100% sure whether we can keep this, or promote it to a wchar_t, or
whether we might need both types. I fear it may be the last.
--
Peter Stephenson <pws@xxxxxxxxxxxxxxxxxxxxxxxx>
Work: pws@xxxxxxx
Web: http://www.pwstephenson.fsnet.co.uk
Messages sorted by:
Reverse Date,
Date,
Thread,
Author