Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: UTF-8 input [was Re: PATCH: zle_params.c]



On Jan 31,  5:01pm, Peter Stephenson wrote:
} Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c]
}
} Bart Schaefer wrote:
} > No.  I mean, suppose the user uses the same .zshrc in both a iso-8859-*
} > and a UTF-8 locale, and has an explicit bindkey command which is intended
} > to work only in the iso-8859-* locale.
} 
} UTF-8 should work fine to that extent: it gets passed straight through
} from the main shell to zle (or anything else) intact by the usual Meta
} mechanism.

That doesn't answer the question.  When reading the .zshrc (or any other
script) and a byte for which mbrtowc() reports incomplete is found, what
decides whether it's part of a string intended for an iso-8859-* locale
or the introducer of a wide character for a UTF-8 locale?

Is the answer "the file just gets metafied as if it were a binary stream
and individual modules work it out later"?

} > If multibyte translation is handled by a widget at the same priority
} > as all other widgets, that "stray" bindkey can mess up the whole
} > scheme.
} 
} You mean if the input is real UTF-8 and a widget grabs the first byte,
} leaving garbage?  Yes, that's a real problem.  I was expecting that the
} shell would either be set up to handle old-style input, or new style
} input, not a combination

In other words, you assume that nobody will try to use the same .zshrc in 
two different locales, or at least not without wrapping bits of it in
tests of the value of LC_CTYPE or the like.

} I don't see much more we can do within the shell without more
} clairvoyance than usual and without breaking someone's setup.  Please
} enlighten me.

I don't (yet?) know what else we can do, either; I'm just pointing out
issues to make sure they've been considered.

A question that comes to mind is, how will the shell deal with UTF-8
input when ZLE is not enabled?



Messages sorted by: Reverse Date, Date, Thread, Author