Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: UTF-8 input [was Re: PATCH: zle_params.c]



Bart Schaefer wrote:
> No.  I mean, suppose the user uses the same .zshrc in both a iso-8859-*
> and a UTF-8 locale, and has an explicit bindkey command which is intended
> to work only in the iso-8859-* locale.  That bindkey happens to use a
> character for which, in the UTF-8 locale, mbrtowc() reports incomplete.
> This was in part why I added the footnote asking about plans for UTF-8
> in shell scripts; is it even possible to have the same .zshrc in these
> cases?

UTF-8 should work fine to that extent: it gets passed straight through
from the main shell to zle (or anything else) intact by the usual Meta
mechanism.  (That's why I'm so keen on retaining the current string
representation in the main shell.)  If we keep metafied input strings as
the hash keys for the key binding lookups and they are simply string
arguments to bindkey, then there shouldn't be a problem.  I think.

The bit that doesn't work is when you try to examine individual
characters in the main shell; you will get single bytes, possibly with
the 8th bit set.  I can't think of a simple case where setting up key
bindings would need this to work, however.

> I'm still worried about the case where that bindkey exists but is for a
> function other than self-insert.  If multibyte translation is handled by
> a widget at the same priority as all other widgets, that "stray" bindkey
> can mess up the whole scheme.

You mean if the input is real UTF-8 and a widget grabs the first byte,
leaving garbage?  Yes, that's a real problem.  I was expecting that the
shell would either be set up to handle old-style input, or new style
input, not a combination, based on what the user (or administrator; this
should all be possible to automate relatively easily) knows about the
system.

To be explicit, either:

- Input system is not UTF-8 aware; "pass8" or equivalent allows 8-bit
  bindings; any zsh bindings for high-eighth-bit bytes are ordinary
  commands.

or:

- Input system is UTF-8 aware; by hypothesis, any high-eighth-bit
  character sent from the terminal is part of a multibyte character
  (this is beyond our control); any zsh bindings for such bytes reflect
  their use as part of a multibyte character.

The zsh bindings would need to be set by whoever decides which is the
case.  I don't see much more we can do within the shell without more
clairvoyance than usual and without breaking someone's setup.  Please
enlighten me.

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************



Messages sorted by: Reverse Date, Date, Thread, Author