Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Cores almost on demand in patcompile()



On Mon, 10 Oct 2016 19:46:18 -0700
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> On Mon, Oct 10, 2016 at 8:31 AM, Sebastian Gniazdowski <
> sgniazdowski@xxxxxxxxx> wrote:
> 
> > My new observations:
> > – the "ndash", .i.e. this char: >>> – <<<, has a role, because
> > replacing it with other char, also one like §, stops core dump
> >
> 
> Your stack trace in the first message on this thread has a string as the
> "exp" argument of patcompile() that makes me suspicious.  The comments say
> that this argument is expected to be metafied (pattern.c 522), but as best
> I can tell it's passed down from paramsubst() tokenized but not metafied.
> (I don't have access to my usual debugging platform this week.)  One of the
> multibyte characters in the string
>...
> in that exp argument, the character in the position where you
> identified ndash in input2b.txt, has a byte with hex value 0x83 which would
> cause it to be incorrectly interpreted as a metacharacter.  If this is the
> case, this probably results in the pattern being mishandled.
> 
> There were some changes made a while ago to try to optimize memory
> (re)allocation during pattern compilation and globbing.  It's quite
> possible that miscounting of the number of characters in the pattern is
> causing problems with the allocated buffer.  At this point though, I'm just
> speculating.

That comes from "s" within paramsubst(), which is just the minimally
modified parameter substitution string from the command line, which
should still be both metafied and containing tokens (the ability to
quote characters that look like tokens being the main use of
metafication).  It's not the value of the parameter, which goes a
different way.  The pattern just gets compiled once.

Here's an xxd of the complete expression from the backtrace --- the
exp=" and the final double quote before the newline are therefore not
part of the string (this is also a bit confusing since I've left in the
\x.. escapes put in by the debugger but they aren't immediately
relevant):

0000000: 6578 703d 225c 7838 385c 7838 346d 5c78  exp="\x88\x84m\x
0000010: 3861 5c78 3031 676c 6f67 425c 7830 3130  8a\x01glogB\x010
0000020: 5c78 3031 2f44 6966 662f 5c78 3031 2f55  \x01/Diff/\x01/U
0000030: 7365 7273 2f73 676e 6961 7a64 6f77 736b  sers/sgniazdowsk
0000040: 692f 6769 7468 7562 2f7a 7368 2d6e 6176  i/github/zsh-nav
0000050: 6967 6174 696f 6e2d 746f 6f6c 732e 6769  igation-tools.gi
0000060: 745c 7830 3130 6138 3763 3830 3434 6331  t\x010a87c8044c1
0000070: 6336 3637 3761 3338 6363 6461 6264 3336  c6677a38ccdabd36
0000080: 3333 3463 6332 6161 6162 3439 365c 7830  334cc2aaab496\x0
0000090: 325c 7830 365c 227a 6e74 2d74 6d75 782e  2\x06\"znt-tmux.
00000a0: 7a73 680a e280 83ef bfbd 2069 6e69 7469  zsh....... initi
00000b0: 616c 2063 6f6d 6d69 7420 6f66 2073 6b79  al commit of sky
00000c0: 6c69 7465 3231 2073 7562 6d69 7373 696f  lite21 submissio
00000d0: 6e5c 225c 7831 3922 0a                   n\"\x19".

The dodgy sequence is around "e2 80 83 ef bf bd" from byte 0xa4.  That
83 is indeed a Meta, and it looks like it's the only one in the input.
Unmetafiying the following ef gives cf, which is not in the range that
would have caused it to be metafied.

So it looks like this did indeed get previously unmetafied when it
shouldn't.  The question is where...  If this is the case, the crash
could be to do with the lack of null determination rather than
misinterpretation of characters.

The text in that pattern has come from the $word in Sebastian's script:

buf="${buf#(#m)$word}"

Note that (barring accidents with /etc/zshenv) GLOB_SUBST is not on, so
$word should be substituted literally.  This came from:

word="$(<input2b.txt)"

which contains evidently valid multibyte characters (they're fine in the
output I'm seeing but I don't dare mail them raw).  xxd gives

0000000: 4167 6c6f 6742 4130 412f 4469 6666 2f41  AglogBA0A/Diff/A
0000010: 2f55 7365 7273 2f73 676e 6961 7a64 6f77  /Users/sgniazdow
0000020: 736b 692f 6769 7468 7562 2f7a 7368 2d6e  ski/github/zsh-n
0000030: 6176 6967 6174 696f 6e2d 746f 6f6c 732e  avigation-tools.
0000040: 6769 7441 3061 3837 6338 3034 3463 3163  gitA0a87c8044c1c
0000050: 3636 3737 6133 3863 6364 6162 6433 3633  6677a38ccdabd363
0000060: 3334 6363 3261 6161 6234 3936 4246 227a  34cc2aaab496BF"z
0000070: 6e74 2d74 6d75 782e 7a73 6820 e280 9320  nt-tmux.zsh ... 
0000080: 696e 6974 6961 6c20 636f 6d6d 6974 206f  initial commit o
0000090: 6620 736b 796c 6974 6532 3120 7375 626d  f skylite21 subm
00000a0: 6973 7369 6f6e 2259 0a                   ission"Y.

Here we have "e2 80 93" where we previously had "e2 80 83 ef bf bd". The
93 becomes B3 on metafication, which should give "e2 80 83 B3", which
isn't what we've got.

The function readoutput() that inputs the word does do metafication.
That's no great surprise as part of the mayhem is evidently due to the
fact it has been metafied at some point.  I've confirmed by stepping
through this gives 83 b3 [except gdb helpfully says \203\263, but that's
what you get with old technology].

However, when I hit getmatch() I've still got what I expect, the
correctly metafied input string with \203\263:

$11 = 0xb78a8d70 "\210\204m\212\001glogB\001\060\001/Diff/\001/Users/sgniazdowski/github/zsh-navigation-tools.git\001\060a87c8044c1c6677a38ccdabd36334cc2aaab496\002\006\"znt-tmux.zsh  \263 initial commit of skylite21 submission\"\031"

So some option / variant behaviour / extraneous memory effect is coming
into this.

pws



Messages sorted by: Reverse Date, Date, Thread, Author