On Fri, May 10, 2024 at 11:37 AM Mikael Magnusson <mikachu@xxxxxxxxx> wrote:
>
> On Thu, May 9, 2024 at 4:46 PM Advait Maybhate <advait@xxxxxxxx> wrote:
> >
> > Hey folks!
> >
> >
> > Wanted to file a bug report/get a discussion going on the best way to handle emoji variation selectors with Unicode characters.
> >
> >
> > Metadata:
> >
> > Zsh version: zsh 5.9 (x86_64-apple-darwin23.0), OS version: macOS Sonoma 14.3.1
> >
> > Terminal: tested across Warp, Kitty, default Mac terminal, Alacritty, iTerm 2
> >
> >
> > ZLE incorrectly treats characters with the emoji variation selector as 1 character instead of 2 characters, causing off-by-one cursor movement issues in terminals that (correctly) treat it as 2 characters.
> >
> >
> > This is most easily reproduced in Kitty (v0.34), which renders and calculates these emojis as 2 cells (most terminal emulators seem to incorrectly handle this case of Unicode).
> >
> >
> > To repro:
> >
> > Paste in the command “echo ☁️” into Kitty (the last character is \0x2601 followed by \0xFE0F). Note that this results in bracketed paste mode in Zsh.
> >
> >
> > Expected behavior:
> >
> > ZLE contains “echo ☁️”.
> >
> >
> > Actual behavior:
> >
> > ZLE contains “eecho ☁️” (note the additional “e” at the beginning here - inverted colors from the bracketed paste). Confirmed that this is due to an off-by-one on the cursor instruction, from the PTY recording.
> >
> >
> > Screenshot: link
> >
> >
> > I’d love to discuss how to fix this for terminals that do respect variation selectors. One way to do this could be via a new `terminfo` entry, but I’d love to know what ZSH devs think! I’m an engineer building the Warp terminal, so I’d be happy to work on any terminal-side changes of this with `terminfo` (we actually use bracketed paste mode for all commands, to best support multiline commands with Warp's input editor)!
> >
> >
> > Notably, Fish 3.6 seems to calculate the width correctly as 2 cells (this is what originally prompted my investigation, due to the Starship prompt - see fish-shell/issues/10461), along with Bash (using bracketed paste with Bash 5.2).
> >
> >
> > I’ve seen 2017/msg00432 which is related to this, but deals with 0xFE0E not 0xFE0F.
>
> Generally speaking it is impossible to handle combining emoji, since
> the specification allows the rendering to either combine or not
> combine the glyphs, it is not possible for zsh to know how much space
> they will take up. Of course, your problem isn't even about combining
> emoji, but as far as I can see the same conceptual problem applies
> here; there is no way for zsh to know what "render as an image"
> implies for glyph width, all we can do is call wcwidth.
I also meant to say, if wcwidth for the base glyph is 1, then adding a
composing character after with a width of 0, it will not magically
change the width of the base glyph and cannot do so.
https://www.unicode.org/reports/tr51/ does mention that "Current
practice is for emoji to have a square aspect ratio, deriving from
their origin in Japanese. For interoperability, it is recommended that
this practice be continued with current and future emoji. They will
typically have about the same vertical placement and advance width as
CJK ideographs." but zsh cannot have some custom tables of emoji
widths, either wcwidth works correctly or it doesn't.
--
Mikael Magnusson