Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: UNICODE Private Use Area characters in BUFFER
- X-seq: zsh-workers 50865
- From: Jun T <takimoto-j@xxxxxxxxxxxxxxxxx>
- To: zsh-workers@xxxxxxx
- Subject: Re: UNICODE Private Use Area characters in BUFFER
- Date: Fri, 4 Nov 2022 18:55:42 +0900
- Archived-at: <https://zsh.org/workers/50865>
- In-reply-to: <CAN=4vMohKT=CAx5XoSAHrDvx4J--58535h-ZgCn3dkSqvZKDZg@mail.gmail.com>
- List-id: <zsh-workers.zsh.org>
- References: <CAN=4vMowyKmrQtQb=QTxiVzQJXRubz-o2T12=6aQBHSpkKwOig@mail.gmail.com> <CAHYJk3SWfX7ZaFA=WgDBtSPZD0isV5OUHWgf3ienhzhzK+9xQw@mail.gmail.com> <CAN=4vMoLQBt8ST7E3EachnLra05ENPOiY0nDOC0Z_=a=8Mg4SA@mail.gmail.com> <CAH+w=7a-8TtMcXvrmq6RLHbU-maHdD1Zf2ck_h_kzK61bmEr_A@mail.gmail.com> <CAN=4vMohKT=CAx5XoSAHrDvx4J--58535h-ZgCn3dkSqvZKDZg@mail.gmail.com>
> 2022/10/24 2:29, Roman Perepelitsa <roman.perepelitsa@xxxxxxxxx> wrote:
>
> You are right, iswprint(0xE0B0) returns 0.
>
> I'm compiling zsh with --enable-unicode9, so instead of iswprint() it
> goes into u9_iswprint(). This function explicitly handles this case
> and returns 0, just like iswprint(). So we get this:
>
> WCWIDTH(0xE0B0) => 1
> WC_ISPRINT(0xE0B0) => 0
I think iswprint(0xe0b0) (or WC_ISWPRINT()) returns 1 (in UTF-8 locale).
The reason that it doesn't work in Zle seems to be in Zle/zle_refresh.c:
1328 #ifdef MULTIBYTE_SUPPORT
1329 else if (
1330 #ifdef __STDC_ISO_10646__
1331 !ZSH_INVALID_WCHAR_TEST(*t) &&
1332 #endif
1333 WC_ISPRINT(*t) && (width = WCWIDTH(*t)) > 0) {
__STDC_ISO_10646__ is defined in (probably all) Linux (but not in macOS),
and ZSH_INVALID_WCHAR_TEST() is defined in Zle/zle.h:
512 /* The start of the private range we use, for 256 characters */
513 #define ZSH_INVALID_WCHAR_BASE (0xe000U)
514 /* Detect a wide character within our range */
515 #define ZSH_INVALID_WCHAR_TEST(x) \
516 ((unsigned)(x) >= ZSH_INVALID_WCHAR_BASE && \
517 (unsigned)(x) <= (ZSH_INVALID_WCHAR_BASE + 255u))
ZSH_INVALID_WCHAR_TEST() returns true for the wide character wc in the
range 0xe000 <= wc <= 0xe0ff. It seems zsh assume that this range
is not used by users and use it for representing "invalid" (or incomplete)
characters (see line 452 in Zle/zle_utils.c).
If characters in this range need be output as is, then we need some
options or such to disable this feature.
On macOS __STDC_ISO_10646__ is not defined (I think this is a bug of
macOS), and the character U+e0b0 is output as is. But on standard
macOS there is no font that has a glyph for this character, and
it is rendered as "a square with ? inside" (double width).
If you install a font that has a gliph for this character, and if the
gliph is single width, then I guess it will work OK in Zle.
Messages sorted by:
Reverse Date,
Date,
Thread,
Author