Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: sh emulation POSIX non-conformances (printf %10s and bytes vs character)



Stephane Chazelas wrote on Sun, Apr 11, 2021 at 20:42:05 +0100:
> Another POSIX bug fixed by zsh (but which makes it non-compliant):
> 
> With multibyte characters:
> 
> $ printf '|%10s|\n' Stéphane Chazelas
> |  Stéphane|
> |  Chazelas|
> 
> POSIX requires:
> 
> | Stéphane|
> |  Chazelas|
> 
> (with a UTF-8 é encoded one 2 bytes

Note that e-with-acute has two encodings in Unicode:

é, one codepoint, two UTF-8 bytes
é, two codepoints, three UTF-8 bytes

https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms




Messages sorted by: Reverse Date, Date, Thread, Author