Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
- X-seq: zsh-workers 52589
- From: Stephane Chazelas <stephane@xxxxxxxxxxxx>
- To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
- Cc: zsh workers <zsh-workers@xxxxxxx>
- Subject: Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
- Date: Sat, 24 Feb 2024 09:47:22 +0000
- Archived-at: <https://zsh.org/workers/52589>
- In-reply-to: <CAH+w=7bTFowrTNu8rorLzSbQyW70oGuppYYvPdF40RTJk4bQ8w@mail.gmail.com>
- List-id: <zsh-workers.zsh.org>
- Mail-followup-to: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>, zsh workers <zsh-workers@xxxxxxx>
- References: <20240221194534.o2mufin7orng6ttg@chazelas.org> <CAH+w=7Z0Evb019EX=bLtgHh0UOPy1J-nUO5paz+AxDXTtVGNSw@mail.gmail.com> <20240221202150.tccftcqbxqqexq4x@chazelas.org> <CAH+w=7ah2tG=QOFDVirAm2PdeX4CXqqjjc+0JinOAG_jkgR6sQ@mail.gmail.com> <20240222072313.7woy5vxvt4fbxyhj@chazelas.org> <20240222075528.eruaoosiuhmcrdsy@chazelas.org> <CAH+w=7Z5CELddo2qJtiNWM2AmxkRoqKz3Pj9LdEGb1z_3SyqeQ@mail.gmail.com> <CAH+w=7YcXGPDNFaGA_onRnRmWPhrZ3ems5x_9amEaH8y2miWSA@mail.gmail.com> <20240223192717.tczrbc63fei7d4m2@chazelas.org> <CAH+w=7bTFowrTNu8rorLzSbQyW70oGuppYYvPdF40RTJk4bQ8w@mail.gmail.com>
2024-02-23 14:32:49 -0800, Bart Schaefer:
[...]
> > zsh: bad math expression: operand expected at `|aM-^C c'
>
> You're missing part of my point here.
>
> % printf '%d\n' $(( 1+|a\x83 c ))
> zsh: bad math expression: operand expected at `|a\x83 c '
>
> That is IMO more useful than either of "^@" or "M-^C " and is down to
> the difference between using printf on a $'...' string (which is
> interpreted before printf even gets its mitts on it, and two layers
> before the math parser does) vs. using the actual math parser
> directly. This has nothing to do with how the string is passed to
> zerr() and everything to do with how printf and parsing interpret the
> input -- by the time the math parser actually calls zerr() it can't
> know how to unwind that, and the internals of zerr() are even further
> removed.
>
> I would therefore argue that these examples are out of scope for this
> discussion -- these examples are not about how zerr() et al. should
> receive strings, they're about how the math parser should receive
> them, and needs to be fixed upstream e.g. in bin_print().
[...]
I agree the bug is in printf which forgets to metafy the input
before passing to the math parse. Which can be seen with:
$ typeset -A a
$ printf '%d\n' 'a[ÃÃÃÃÃÃ]=1'
1
$ (( a[ÃÃÃÃÃÃ] = 2 ))
typeset -A a=( [ÃÃÃÃÃÃ]=2 [$'\M-C\M-c\M-c\M-c\M-c\M-c\M-\C-C']=1 )
> More relevant to this discussion is that math errors are one of the
> two existing callers using the %l format, so any attempt to improve
> this is going to require changing those calls anyway.
I don't see why we'd need to change the call to zerr in those
cases. Just fix printf.
$ a=$'\x83 foobarbaz' b='\x83 foobarbaz'
~$ (( 1+|$b ))
zsh: bad math expression: operand expected at `|\x83 foob...'
$ (( 1+|$a ))
zsh: bad math expression: operand expected at `|\M-^C foobarb...'
Are correct, we do want the 0x83 byte which is not printable to
be rendered as \M-^C.
>
> > For 1, IMO, when the error message is generated by zsh, it
> > should go through nicezputs(). zsh should decide of the
> > formatting, have it pass escape sequences as-is would make it
> > hard to understand and diagnose the error.
>
> Agreed in concept, but there's a difference between errors actually
> generated BY zsh, and errors with user input that zsh is reporting.
> For example, the same literal string might be a file name generated by
> globbing, or it might be something the user typed out in a
> syntactically invalid command. There's no way to put intelligence
> about how to format those into the guts of zerr().
I don't think that's a contention point. All those cases are
cases where we need to make the non-printable characters in the
user data visible with nicezputs.
The question is not about user input vs no user input in the
displayed error, but only for those where there's user input,
whether that user input is mean to be an error message formatted
by the user or not. And I can only think of
${var[:]?user-supplied-error}, and imagine that at least 99% of
the 499 other cases are not about printing a user-supplied error
message.
> There's already a way to pass text not containing NUL (%s) and a way
> to pass text as ptr+len (%l). There are a vanishingly small number of
> uses of the latter (2 callers out of the ~500 total call examples).
> There's exactly one case so far of wanting output to contain NUL, and
> per the "only caller can interpret" assertion, it seems worthwhile to
> use %l for the NUL case and let the other 3 callers decide to "nice"
> the strings they pass (or not).
>
> This not only skips extra metafication needed to use the proposed %S,
> but also simplifies the implementation of %l, and requires the
> addition of only 1 or 2 lines of code to each of the two existing
> callers using %l (maybe zero lines of code in the case of yyerror()).
>
> > %S also passed metafied, but no nicezputs.
>
> That requires metafy in the caller followed by unmetafy in zerr().
> Much easier to remove code from %l than to add it to a new %S,
> especially given that we're editing the solitary caller where %S would
> be used.
But in the case of ${var?err}, the err is already metafied, so
if you make %l take unmetafied input, you're just moving the
unmetafication to the caller which is counterproductive as it
makes it break on NULs.
Also %l is intended (at least in the one case I saw it used) to
truncase user input, so it should be nicezputs'ed.
>
> > Now, my previous message was showing there were quite a few
> > issue with the metafication and possibly with the nicezputs'ing
> > and/or multibyte handling.
>
> Fine, but not fixable in zerr() and friends.
Sorry for the confusion, I didn't mean to say that's where it
was to be fixed. I agree it's all cases where it's the caller
failing to do the metafication (in the case of printf, the
metafication was missing from much earlier).
[...]
> % printf '%d\n' '1+ÃÃÃÃÃÃ'
> BUG: unexpected end of string in ztrlen()
> zsh: bad math expression: operand expected at `\M-C\M-c\M-c\M-c\M-c\M-c'
> 0
> % printf '%d\n' $((1+ÃÃÃÃÃÃ))
> 1
>
> (Also a bit weird that the first \M-C is capitalized and the rest are
> not?) Still not a problem to be resolved in zerr().
\M-C is the visual representaion of 0xc3, \M-c of 0xe3, ÃÃ is
c3 83 c3 83. It's just that unmetafy turned 83 c3 into e3.
> > > $ ((1+|ÃÃÃÃÃÃ))
> > > zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...'
> >
> > In that case, metafication OK, but character cut in the middle.
>
> Still not zerr()'s fault and needs to be addressed where the number of
> bytes for %l is being calculated in checkunary().
zerr could try and decode the string as text and truncate the
specified number of *characters* instead of bytes, but like I
said, that may be overkill as we can live with the odd character
cut in the middle.
> > > % ((1+|ÃÃÃÃÃÃ))
> > > zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...'
> >
> > It seems rather worse to me.
>
> That's because of the way I chose to lift nice-ifying up into
> checkunary() for testing the approach. It's hard to be consistent
> there, given the foregoing business about different formats being sent
> down from printf vs. $((...)), and it's also why I said "no patch
> without feedback".
I guess those ? are some 0x83 bytes added by metafication, and
we're missing the corresponding unmetafy.
To me, the only things to do are:
1. add a %S for raw output (expects metafied input like
everything else) to tbe used by ${var[:]?error} and likely only
those.
2. Add missing metafy in bin_print (and possibly elsewhere)
before calling the math parser
3. Fix those cases where zerrmsg is called with %s/%l/%S
arguments non-metafied like in that "bad interpreter" case
above.
4. (optional): Improve %l usages to truncate based on number of
characters rather than bytes or at least avoid cutting
characters in the middle.
--
Stephane
Messages sorted by:
Reverse Date,
Date,
Thread,
Author