Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: printf %q segfault
- X-seq: zsh-workers 39653
- From: Daniel Shahaf <d.s@xxxxxxxxxxxxxxxxxx>
- To: lolilolicon <lolilolicon@xxxxxxxxx>
- Subject: Re: printf %q segfault
- Date: Sun, 16 Oct 2016 16:03:12 +0000
- Cc: zsh-workers@xxxxxxx
- Dkim-signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= daniel.shahaf.name; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=hUB7lHoEu8Ih0G+L 695QQSCByZA=; b=SC+NmJpWzUPy7jn6PeomQqbTYnc2feu0oKWDSvKErjNpfPSA uXEK+7BAQeKtyjlM8+jAMu9pIkmc4qa5rHWu5BAtsbkz3I3Kp4jQWblYLj1BhIfK 2zDzlpwS+WV7oGEgre3NEljzKJiQjYhHZqSJV/Nb2Ave7aCElUkjH6cT3Eo=
- Dkim-signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=hUB7lHoEu8Ih0G+ L695QQSCByZA=; b=s2o7ezOy42o4PKC9Ie/4QRw3uz83fq+bdBQHE54iV3bj489 7sajv+TO5BrfmgiNp/wpdSBh3Ikhc1nPYsMYFg1rLu1OQ+ERZSmV4XPkzRkNXwT/ qf2uZvE4fcDnkV00M4m6W/j4266DoHSmjad5RQyXK0GFuUlhz6LiAAW3UDao=
- In-reply-to: <CAMtVo_N+O_-s-H4-ih=N=oQ5bTRVbgJ9Odk6JCMiv3E-Lzsnmw@mail.gmail.com>
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- References: <CAMtVo_N+O_-s-H4-ih=N=oQ5bTRVbgJ9Odk6JCMiv3E-Lzsnmw@mail.gmail.com>
lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800:
> The following produces segmentation fault:
>
> printf '%q' 你
>
> produced with zsh 5.2.
>
> Ask if you need any more info.
With latest master it doesn't segfault, but it's not correct, either:
% printf '%q' 你 | xxd
0000000: 2427 5c33 3434 2724 275c 3237 3527 a0 $'\344'$'\275'.
The UTF-8 encoding of your character is E4 BD A0, however, the low byte
(0xA0) is output literally. Since a lone 0xA0 is not a valid UTF-8
sequence, my terminal renders it [if I remove the |xxd pipe] as a U+FFFD
REPLACEMENT CHARACTER instead.
This also reproduces with «printf '%q\n' $'\U00A0'», which should print
either « » (a non-breaking-space) or «$'\302'$'\240'» (the quotestring()
representation of the UTF-8 encoding of U+00A0; that encoding is C2 A0).
Bottom line: the byte 0xA0 should not be printed literally but escaped.
The reason 0xA0 is output literally is that the code takes the "if (itok(*u))"
branch in quotestring(); if it didn't take that branch, it'd behave
correctly.
Cheers,
Daniel
Messages sorted by:
Reverse Date,
Date,
Thread,
Author