Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: multibyte backwarddeletechar

X-seq: zsh-workers 16093
From: Clint Adams <clint@xxxxxxx>
To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
Subject: Re: multibyte backwarddeletechar
Date: Sun, 21 Oct 2001 14:21:06 -0400
Cc: zsh-workers@xxxxxxxxxx
In-reply-to: <1011021171339.ZM14059@xxxxxxxxxxxxxxxxxxxxxxx>; from schaefer@xxxxxxxxxxxxxxxx on Sun, Oct 21, 2001 at 05:13:38PM +0000
Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
References: <20011021114254.A17952@xxxxxxxx> <1011021171339.ZM14059@xxxxxxxxxxxxxxxxxxxxxxx>

> I'm a bit surprised that this wouldn't cause significant confusion in
> the ZLE display code.  How did the multi-byte character get input in
> the first place?  Is it displayed as occupying one character position
> on the screen, or several?  If only one, doesn't the cursor end up in
> the wrong place on most word- or line-oriented motions that cross it?

That depends on the terminal emulator and font.  If I run
LANG=zh_TW.Big5 crxvt -ls -fm taipei16 -fn 8x16 -km big5 ,
each BIG5 character (2 octets) appears to take up the
vertical space on one ASCII character, and horizontal space
of two ASCII characters.  If I run
LANG=zh_TW.Big5 crxvt -ls -fm taipei14 -fn 8x16 -km big5 ,
each BIG5 character (2 octets) appears to take up the
vertical space on one ASCII character, and horizontal space
of two and a half (2.5) ASCII characters, although crxvt
does some ugly overlapping resulting in ZLE not getting confused.
If I run LANG=ja_JP.UTF-8 xterm -class UXTerm ,
each UTF-8 Kanji character (3 octets) appears to take up
the same (2 horizontal, 1 vertical) space.  In this case,
ZLE does get horribly confused.  If I run
LANG=ru_RU.UTF-8 xterm -class UXTerm ,
each UTF-8 Cyrillic character (3 octets) appears to take
up the horizontal and vertical space of one ASCII character.
This also makes ZLE horribly confused.  If I run
LANG=fr_FR.UTF-8 xterm -class UXTerm ,
each UTF-8 French non-ASCII character (2 octets)
appears to take up the horizontal and vertical space of one
ASCII character.  Again, this confuses ZLE.

I imagine that 6-byte characters will generally take up
less horizontal space than 6 ASCII characters as well.

> If we're going to support wide and/or multi-byte characters, I think we
> should Do It Right, not by pasting a zillion workarounds into individual
> editor functions.

I suspect that Doing It Right involves changing char *line to
wchar_t *wline, and modifying all dependencies accordingly.
Additionally, we'd need to figure out how much space each
individual character consumes.

References:
- multibyte backwarddeletechar
  - From: Clint Adams
- Re: multibyte backwarddeletechar
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author