Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: [BUG] ZLE character width with emoji presentation variation selectors in Unicode
- X-seq: zsh-workers 52922
- From: Mikael Magnusson <mikachu@xxxxxxxxx>
- To: Advait Maybhate <advait@xxxxxxxx>
- Cc: zsh-workers@xxxxxxx
- Subject: Re: [BUG] ZLE character width with emoji presentation variation selectors in Unicode
- Date: Fri, 10 May 2024 11:37:50 +0200
- Archived-at: <https://zsh.org/workers/52922>
- In-reply-to: <CAN+tYMf4fH2Lkww5nzAB24fGZ6uJAt7r_FpRcFocYpaYOD=1Yw@mail.gmail.com>
- List-id: <zsh-workers.zsh.org>
- References: <CAN+tYMf4fH2Lkww5nzAB24fGZ6uJAt7r_FpRcFocYpaYOD=1Yw@mail.gmail.com>
On Thu, May 9, 2024 at 4:46 PM Advait Maybhate <advait@xxxxxxxx> wrote:
>
> Hey folks!
>
>
> Wanted to file a bug report/get a discussion going on the best way to handle emoji variation selectors with Unicode characters.
>
>
> Metadata:
>
> Zsh version: zsh 5.9 (x86_64-apple-darwin23.0), OS version: macOS Sonoma 14.3.1
>
> Terminal: tested across Warp, Kitty, default Mac terminal, Alacritty, iTerm 2
>
>
> ZLE incorrectly treats characters with the emoji variation selector as 1 character instead of 2 characters, causing off-by-one cursor movement issues in terminals that (correctly) treat it as 2 characters.
>
>
> This is most easily reproduced in Kitty (v0.34), which renders and calculates these emojis as 2 cells (most terminal emulators seem to incorrectly handle this case of Unicode).
>
>
> To repro:
>
> Paste in the command “echo ☁️” into Kitty (the last character is \0x2601 followed by \0xFE0F). Note that this results in bracketed paste mode in Zsh.
>
>
> Expected behavior:
>
> ZLE contains “echo ☁️”.
>
>
> Actual behavior:
>
> ZLE contains “eecho ☁️” (note the additional “e” at the beginning here - inverted colors from the bracketed paste). Confirmed that this is due to an off-by-one on the cursor instruction, from the PTY recording.
>
>
> Screenshot: link
>
>
> I’d love to discuss how to fix this for terminals that do respect variation selectors. One way to do this could be via a new `terminfo` entry, but I’d love to know what ZSH devs think! I’m an engineer building the Warp terminal, so I’d be happy to work on any terminal-side changes of this with `terminfo` (we actually use bracketed paste mode for all commands, to best support multiline commands with Warp's input editor)!
>
>
> Notably, Fish 3.6 seems to calculate the width correctly as 2 cells (this is what originally prompted my investigation, due to the Starship prompt - see fish-shell/issues/10461), along with Bash (using bracketed paste with Bash 5.2).
>
>
> I’ve seen 2017/msg00432 which is related to this, but deals with 0xFE0E not 0xFE0F.
Generally speaking it is impossible to handle combining emoji, since
the specification allows the rendering to either combine or not
combine the glyphs, it is not possible for zsh to know how much space
they will take up. Of course, your problem isn't even about combining
emoji, but as far as I can see the same conceptual problem applies
here; there is no way for zsh to know what "render as an image"
implies for glyph width, all we can do is call wcwidth. I took a quick
look at some unicode emoji standards pages and none of them even
mention the word width. If you can find an authorative part of the
standard talking about emoji width, feel free to link it... In my
terminal your example renders as 1 glyph wide which agrees with zsh's
guess, and I don't get any display errors.
--
Mikael Magnusson
Messages sorted by:
Reverse Date,
Date,
Thread,
Author