I'm using the macOS 13.2.1 OS-provided zsh, version 5.8.1, which I understand isn't the latest and greatest of 5.9, so perhaps this bug has already been addressed.
In the 4-byte sequence as seen below ( defined via explicit octal codes ), under no Unicode scenario should 4 bytes be printed out via a command of printf %.1s, by design.
- The first byte of \377 \xFF is explicitly invalid under UTF-8 (even allowing up to 7-byte in the oldest of definitions).
- The 4-byte value is too large to constitute a single character under either endian of UTF-32.
- It's also not a pair of beyond-BMP UTF-16 surrogates either, regardless of endian
At best, if treated as UTF-16, of either endian, this 4-byte sequence represents 2 code points, in which case, only 2 bytes should be printed not 4.
My high-level understanding of printf %.1s is that it should output the first locale-valid character of the input string, and in its absence, output the first byte instead, if any, so setting LC_ALL=C or POSIX would defeat the purpose of this bug report.
The reproducible sample shell command below includes what the output from zsh built-in printf looks like, what the macOS built-in printf looks like, and what the gnu printf looks like, all else being equal. The testing shell was invoked via
invoked via
zsh --restricted --no-rcs --nologin --verbose -xtrace -f -c
In all 3 test scenarios, LC_ALL is explicitly cleared, while LANG is explicitly set to a widely used one.
The od used is the macOS one, not the gnu one.
To my best knowledge, the other printfs have produced the correct output.
Thanks for your time.
====================================================================
echo; echo "$ZSH_VERSION"; echo; uname -a; echo; LC_ALL= LANG="en_US.UTF-8" builtin printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ; echo; LC_ALL= LANG="en_US.UTF-8" command printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ; echo; LC_ALL= LANG="en_US.UTF-8" gprintf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ; echo;
+zsh:1> echo
+zsh:1> echo 5.8.1
5.8.1
+zsh:1> echo
+zsh:1> uname -a
Darwin m1mx4CT 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000 arm64
+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 +zsh:1> printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'
+zsh:1> od -bacx
0000000 012 012 011 133 377 210 234 256 135 012 012
nl nl ht [ ? 88 9c ? ] nl nl
\n \n \t [ 377 210 234 256 ] \n \n
0a0a 5b09 88ff ae9c 0a5d 000a
0000013
+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'
+zsh:1> od -bacx
0000000 012 012 011 133 377 135 012 012
nl nl ht [ ? ] nl nl
\n \n \t [ 377 ] \n \n
0a0a 5b09 5dff 0a0a
0000010
+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 gprintf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'
+zsh:1> od -bacx
0000000 012 012 011 133 377 135 012 012
nl nl ht [ ? ] nl nl
\n \n \t [ 377 ] \n \n
0a0a 5b09 5dff 0a0a
0000010
+zsh:1> echo
zsh 5.8.1 (x86_64-apple-darwin22.0)