Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
zsh generates invalid UTF-8 encoding in the history
- X-seq: zsh-workers 39569
- From: Vincent Lefevre <vincent@xxxxxxxxxx>
- To: zsh-workers@xxxxxxx
- Subject: zsh generates invalid UTF-8 encoding in the history
- Date: Wed, 5 Oct 2016 13:48:48 +0200
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- Mail-followup-to: zsh-workers@xxxxxxx
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
With Debian's zsh 5.2-5 + some patches, when I execute commands with
some particular Unicode characters, the UTF-8 sequences are rewritten
incorrectly in the history. For instance:
cventin:~> unicode ─
U+2500 BOX DRAWINGS LIGHT HORIZONTAL
UTF-8: e2 94 80 UTF-16BE: 2500 Decimal: ─ Octal: \022400
─
Category: So (Symbol, Other)
Unicode block: 2500..257F; Box Drawing
Bidi: ON (Other Neutrals)
But in the history, instead of getting e2 94 80, I get: e2 83 b4 80.
Concerning "e2 83 b4 80":
cventin:~> unicode --fromcp utf-8 -x e283b4
U+20F4 - No such unicode character name in database
UTF-8: e2 83 b4 UTF-16BE: 20f4 Decimal: ⃴ Octal: \020364
()
Uppercase: 20F4
Category: Cn (Other, Not Assigned)
Unicode block: 20D0..20FF; Combining Diacritical Marks for Symbols
and the 80 on its own is not a valid UTF-8 sequence.
This breaks various tools processing the history (grep, lesspipe,
etc.), first because the expected character is no longer present,
also because of invalid UTF-8, which is not regarded as a character.
For instance:
cventin:~> grep -av '^.*$' .zhistory | tail -n 1 | hd
00000000 3a 20 31 34 37 35 36 36 36 34 31 38 3a 30 3b 75 |: 1475666418:0;u|
00000010 6e 69 63 6f 64 65 20 e2 83 b4 80 0a |nicode .....|
0000001c
--
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Messages sorted by:
Reverse Date,
Date,
Thread,
Author