Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: sh emulation POSIX non-conformances (printf %10s and bytes vs character)
- X-seq: zsh-workers 48540
- From: Stephane Chazelas <stephane@xxxxxxxxxxxx>
- To: Daniel Shahaf <d.s@xxxxxxxxxxxxxxxxxx>
- Cc: Zsh hackers list <zsh-workers@xxxxxxx>
- Subject: Re: sh emulation POSIX non-conformances (printf %10s and bytes vs character)
- Date: Tue, 13 Apr 2021 19:03:49 +0100
- Archived-at: <https://zsh.org/workers/48540>
- In-reply-to: <20210413155744.GS6819@tarpaulin.shahaf.local2>
- List-id: <zsh-workers.zsh.org>
- Mail-followup-to: Daniel Shahaf <d.s@xxxxxxxxxxxxxxxxxx>, Zsh hackers list <zsh-workers@xxxxxxx>
- References: <7FD930F4-37CD-402B-9A06-893818856199@dana.is> <CAH+w=7aAZKpT0f5LT7RaoCehyO6UZe6FimzuQqOP4o=+EwZs2w@mail.gmail.com> <F56FD538-0428-4D03-BBE2-6E53154EC0EA@dana.is> <CAH+w=7a6sjNJsDv4KJyW-o45+Q7GNEp7_TL4LGd-os1ozF8T9A@mail.gmail.com> <20210411175726.hxnm33mxoska2tsm@chazelas.org> <20210411194205.e7mr2wx33wlkq3rs@chazelas.org> <20210413155744.GS6819@tarpaulin.shahaf.local2>
2021-04-13 15:57:44 +0000, Daniel Shahaf:
> Stephane Chazelas wrote on Sun, Apr 11, 2021 at 20:42:05 +0100:
> > Another POSIX bug fixed by zsh (but which makes it non-compliant):
> >
> > With multibyte characters:
> >
> > $ printf '|%10s|\n' Stéphane Chazelas
> > | Stéphane|
> > | Chazelas|
> >
> > POSIX requires:
> >
> > | Stéphane|
> > | Chazelas|
> >
> > (with a UTF-8 é encoded one 2 bytes
>
> Note that e-with-acute has two encodings in Unicode:
>
> é, one codepoint, two UTF-8 bytes
> é, two codepoints, three UTF-8 bytes
>
> https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms
That was shown already in the part of my message you didn't
quote, where I pointed out how ksh93 addresses it with its %Ls
(zsh also has ${(ml[10])var} for that though).
See also:
https://unix.stackexchange.com/questions/350240/why-is-printf-shrinking-umlaut
Cheers,
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author