Hey folks!
Wanted to file a bug report/get a discussion going on the best way to handle emoji variation selectors with Unicode characters.
Metadata:
Zsh version: zsh 5.9 (x86_64-apple-darwin23.0), OS version: macOS Sonoma 14.3.1
Terminal: tested across Warp, Kitty, default Mac terminal, Alacritty, iTerm 2
ZLE incorrectly treats characters with the emoji variation selector as 1 character instead of 2 characters, causing off-by-one cursor movement issues in terminals that (correctly) treat it as 2 characters.
This is most easily reproduced in Kitty (v0.34), which renders and calculates these emojis as 2 cells (most terminal emulators seem to incorrectly handle this case of Unicode).
To repro:
Paste in the command “echo ☁️” into Kitty (the last character is \0x2601 followed by \0xFE0F). Note that this results in bracketed paste mode in Zsh.
Expected behavior:
ZLE contains “echo ☁️”.
Actual behavior:
ZLE contains “eecho ☁️” (note the additional “e” at the beginning here - inverted colors from the bracketed paste). Confirmed that this is due to an off-by-one on the cursor instruction, from the PTY recording.
Screenshot: link
I’d love to discuss how to fix this for terminals that do respect variation selectors. One way to do this could be via a new `terminfo` entry, but I’d love to know what ZSH devs think! I’m an engineer building the Warp terminal, so I’d be happy to work on any terminal-side changes of this with `terminfo` (we actually use bracketed paste mode for all commands, to best support multiline commands with Warp's input editor)!
Notably, Fish 3.6 seems to calculate the width correctly as 2 cells (this is what originally prompted my investigation, due to the Starship prompt - see fish-shell/issues/10461), along with Bash (using bracketed paste with Bash 5.2).
I’ve seen 2017/msg00432 which is related to this, but deals with 0xFE0E not 0xFE0F.
Thanks!
Best,
Advait