Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: ZWJ paste from clipboard problem (unicode)



Vincent Lefevre wrote:
> > So zsh would need to either depend on some additional external library
> > or we need to fetch tables of unicode code points and generate our own
> > tables to classify them.
>
> If you mean hard-coded tables (not based on what wcwidth provides),
> this is a bad idea. This was what GNU Screen was doing and led to
> recurrent display issues, until this got replaced by wcwidth:

There's nothing wrong with tables per se. Even wcwidth() is probably
using a table underneath. You're going to get display issues if the
terminal and terminal programs have different ideas about the width
of characters. Which is why terminals have decided to use wcwidth by
default and invent "mode 2027" so that programs running in the terminal
can actually support newer characters. Even with wcwidth(), you'll have
discrepancies when you ssh between new and old systems. But if zsh
(or screen) were to only use tables after querying and enabling mode
2027 it'd potentially resolve problems such as that which started this
thread, at least on some newer terminals.

Zsh itself has had a --enable-unicode9 configure option since 2016 and
if you look at Src/wcwidth9.h that's just hard-coded tables to classify
characters. I've never used it. Would be good to know if it could now be
removed - unicode is now up to version 16. My understanding was that it
was only ever useful with a particular macOS vintage.

From a quick glance at the utf8proc sources, they're downloading
data files from https://www.unicode.org/Public/ and processing them
to produce C headers. If we don't want to add a library dependency,
depending instead on that public data would at least adapt as the
Unicode standard evolves. But it'd be duplicating a lot of work that
existing libraries can do.

> and there seem to be portability issues with libICU.

It has more issues besides this (the 28M data library and C++ dependency
for starters). I mentioned it because it is what most things use,
especially all the big stuff like web browsers, desktop environments,
office suites and language runtimes.

Oliver




Messages sorted by: Reverse Date, Date, Thread, Author