Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
TR : Re : [BUG] Crash due to malloc call in signal handler
- X-seq: zsh-workers 45019
- From: "Antoine C." <acalando@xxxxxxx>
- To: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
- Subject: TR : Re : [BUG] Crash due to malloc call in signal handler
- Date: Fri, 13 Dec 2019 18:09:33 +0100 (CET)
- Cc: zsh-workers@xxxxxxx
- List-help: <mailto:zsh-workers-help@zsh.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:zsh-workers@zsh.org>
- List-unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
(and now zsh mail server is returning back all my mails
to the list (????), so I am replying directly to you...)
---
(sorry to fork from the original thread, I had not subscribed
to the list and cannot reply-all the last mail)
So Peter what I understand from your mails is that malloc
functions are called from signal handler on purpose, but
only at the time you think is right, so even if it is
forbidden, it should work... Well, it does not. There is
a backtrace at the end of the mail showing it clearly, but
we can dive into the details to understand what is happening.
Here I am compiling zsh and glibc from source with some
custom traces, with mcheck enabled to identify with more
ease when a corruption occurs. I am using a special script
spawning a lost of children, and the crash occurs within
seconds (after I spent a lot of time finding the right
delays in this script).
You may want to refer to malloc() and mallochook():
https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c#l3022
https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/mcheck.c#l201
As an example, here is the output of the traces with a
normal allocation:
TIMESTAMP TS DIFF IDX TID MY_ID PARAM CALLER MORE OR LESS CURRENT LINE
14632e2c36f5e ( +12919) 93b ___ 2e6e _____ a0 ___ 7ffff7317f90 |__GI__IO_file_doallocate at glibc/libio/filedoalloc.c:101 | __GI___libc_malloc at glibc/malloc/malloc.c:3037 <= malloc called
14632e2c37310 ( +946) 93c ___ 2e6e _ c0c005 ______________ 0 | __GI___libc_malloc at glibc/malloc/malloc.c:3090 | mallochook at glibc/malloc/mcheck.c:289 <= hook remove the hook
14632e2c3762f ( +799) 93d ___ 2e6e _____ a0 ______________ 0 | mallochook at glibc/malloc/mcheck.c:295 | __GI___libc_malloc at glibc/malloc/malloc.c:3037 <= malloc again without hook
14632e2c3a0e0 ( +10929) 93e ___ 2e6e _ c0c006 ______________ 0 | __GI___libc_malloc at glibc/malloc/malloc.c:3090 | mallochook at glibc/malloc/mcheck.c:296 <= hook put back the hook
14632e2c3a637 ( +1367) 93f ___ 2e6e ______ a ___ 555555898320 |__GI__IO_file_doallocate at glibc/libio/filedoalloc.c:101 | mallochook at glibc/malloc/mcheck.c:308 <= hook returns chunk
Now is the end of the traces just after an abort from mcheck:
14632e40e983b ( +14317) _ 73 ___ 2e6e _____ 1f ___ 55555588d600 + rembgstatus at zsh-git/Src/jobs.c:2147 | remnode at zsh-git/Src/linklist.c:307
14632e40e9c45 ( +1034) _ 74 ___ 2e6e ___ f000 ___ 55555588d600 + remnode at zsh-git/Src/linklist.c:316 | zfree at zsh-git/Src/mem.c:1897
14632e40ea015 ( +976) f7d ___ 2e6e ______ f ___ 55555588d600 | zfree at zsh-git/Src/mem.c:1899 | freehook at glibc/malloc/mcheck.c:251
14632e40ea39c ( +903) f7e ___ 2e6e _ dead00 ___ 55555588d600 | freehook at glibc/malloc/mcheck.c:258 | checkhdr at glibc/malloc/mcheck.c:172
Here rembgstatus() wants to free the chunk 0x55555588d600, but mcheck detects it is corrupted.
So we search back when it was allocated and we get this interesting sequence:
A 14632e2a6c4f2 ( +11779) 8a3 ___ 2e6e _____ a0 ___ 7ffff7317f90 |__GI__IO_file_doallocate at glibc/libio/filedoalloc.c:101 | __GI___libc_malloc at glibc/malloc/malloc.c:3037
A 14632e2a6c825 ( +819) 8a4 ___ 2e6e _ c0c005 ______________ 0 | __GI___libc_malloc at glibc/malloc/malloc.c:3090 | mallochook at glibc/malloc/mcheck.c:289
A 14632e2a6cacf ( +682) 8a5 ___ 2e6e _____ a0 ______________ 0 | mallochook at glibc/malloc/mcheck.c:295 | __GI___libc_malloc at glibc/malloc/malloc.c:3037
B 14632e2a797b1 ( +52450) 620 ___ 2e6e ___ f000 ___ 55555588e560 + freejob at zsh-git/Src/jobs.c:1367 (discriminator 2) | zfree at zsh-git/Src/mem.c:1897
B 14632e2a7aab0 ( +4863) 8a6 ___ 2e6e ______ f ___ 55555588e560 | zfree at zsh-git/Src/mem.c:1899 | freehook at glibc/malloc/mcheck.c:251
C 14632e2a7b41a ( +2410) 621 ___ 2e6e ___ a001 ______________ 0 + addbgstatus at zsh-git/Src/jobs.c:2192 | zalloc at zsh-git/Src/mem.c:967
C 14632e2a7b7a4 ( +906) 8a7 ___ 2e6e _____ a0 ______________ 0 | zalloc at zsh-git/Src/mem.c:968 | __GI___libc_malloc at glibc/malloc/malloc.c:3037
C 14632e2a7bb51 ( +941) 622 ___ 2e6e ___ a002 ___ 55555586b2a0 + addbgstatus at zsh-git/Src/jobs.c:2192 | zalloc at zsh-git/Src/mem.c:973
D 14632e2a7beee ( +925) 623 ___ 2e6e ___ a001 e43b7a17062d3000 + zinsertlinknode at zsh-git/Src/linklist.c:206 | zalloc at zsh-git/Src/mem.c:967
D 14632e2a7c21b ( +813) 8a8 ___ 2e6e _____ a0 ______________ 0 | zalloc at zsh-git/Src/mem.c:968 | __GI___libc_malloc at glibc/malloc/malloc.c:3037
D 14632e2a7d5ac ( +5009) 624 ___ 2e6e ___ a002 ___ 55555588d600 + zinsertlinknode at zsh-git/Src/linklist.c:206 | zalloc at zsh-git/Src/mem.c:973
D 14632e2a7e702 ( +4438) 625 ___ 2e6e ______ a ___ 55555588d600 + addbgstatus at zsh-git/Src/jobs.c:2199 | zinsertlinknode at zsh-git/Src/linklist.c:207
A 14632e2a808a2 ( +8608) 8a9 ___ 2e6e _ c0c006 ______________ 0 | __GI___libc_malloc at glibc/malloc/malloc.c:3090 | mallochook at glibc/malloc/mcheck.c:296
A 14632e2a80dcf ( +1325) 8aa ___ 2e6e ______ a ___ 555555898380 |__GI__IO_file_doallocate at glibc/libio/filedoalloc.c:101 | mallochook at glibc/malloc/mcheck.c:308
I prefixed the lines with letters to explain what is going on:
A: file_doallocate() is calling malloc() normally, as in the previous
example, but is interrupted just after removing the mcheck hook
B: freejob() is calling zfree() then free(), from signal context
C: addbgstatus() is doing a first malloc() which is not registered
by mcheck because there is no hook
D: same for zinsertlinknode(); this is this chunk which will trigger
the error
A: the main context resumes, malloc() from file_doallocate() reinstall
the hook and continues.
To make sure we are re-entering malloc from the signal handler, I
added some flags with an abort, and here is the bt:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff72c91f5 in __GI_abort () at abort.c:79
#2 0x00007ffff7317d6a in qweqweqwe () at mcheck.c:195
#3 0x00007ffff7315f0c in __GI___libc_malloc (bytes=8) at malloc.c:3049
#4 0x00005555555cf0e9 in zalloc (size=8) at mem.c:968
#5 0x00005555555bf36f in addbgstatus (pid=26118, status=0) at jobs.c:2192
#6 0x00005555556004e0 in wait_for_processes () at signals.c:583
#7 0x00005555556008d5 in zhandler (sig=17) at signals.c:648
#8 <signal handler called>
#9 0x00007ffff7314a55 in _int_malloc (av=av@entry=0x7ffff7643bc0 <main_arena>, bytes=bytes@entry=4145) at malloc.c:4149
#10 0x00007ffff7315be6 in __GI___libc_malloc (bytes=4145) at malloc.c:3088
#11 0x00007ffff7318335 in mallochook (size=4096, caller=0x7ffff72ff459 <__GI__IO_file_doallocate+121>) at mcheck.c:311
#12 0x00007ffff7315d4a in __GI___libc_malloc (bytes=bytes@entry=4096) at malloc.c:3057
#13 0x00007ffff72ff459 in __GI__IO_file_doallocate (fp=0x55555589aba0) at filedoalloc.c:101
#14 0x00007ffff730e379 in __GI__IO_doallocbuf (fp=fp@entry=0x55555589aba0) at genops.c:347
#15 0x00007ffff730d26c in _IO_new_file_underflow (fp=0x55555589aba0) at fileops.c:490
#16 0x00007ffff730e3f2 in __GI__IO_default_uflow (fp=0x55555589aba0) at genops.c:362
#17 0x00005555555980bc in readoutput (in=12, qt=1, readerror=0x0) at exec.c:4673
#18 0x0000555555597adf in getoutput (cmd=0x7ffff7feb28a "date +%s.%N", qt=1) at exec.c:4628
#19 0x0000555555605352 in stringsubst (list=0x7fffffffae30, node=0x7fffffffae50, pf_flags=4, ret_flags=0x7fffffffaab4, asssub=0) at subst.c:399
#20 0x00005555556042d5 in prefork (list=0x7fffffffae30, flags=6, ret_flags=0x7fffffffaab4) at subst.c:142
#21 0x0000555555596373 in execcmd_exec (state=0x7fffffffb3e0, eparams=0x7fffffffb000, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:4047
#22 0x000055555558fe8d in execpline2 (state=0x7fffffffb3e0, pcode=131, how=2, input=0, output=0, last1=0) at exec.c:1930
#23 0x000055555558ea30 in execpline (state=0x7fffffffb3e0, slcode=9218, how=2, last1=0) at exec.c:1660
#24 0x000055555558dcd3 in execlist (state=0x7fffffffb3e0, dont_change_job=1, exiting=0) at exec.c:1415
#25 0x000055555558d30e in execode (p=0x5555558805c0, dont_change_job=1, exiting=0, context=0x555555627582 "shfunc") at exec.c:1194
#26 0x000055555559b9c7 in runshfunc (prog=0x5555558805c0, wrap=0x0, name=0x7ffff7feb168 "ts") at exec.c:5983
#27 0x000055555559af8b in doshfunc (shfunc=0x55555588c490, doshargs=0x7ffff7ff15f0, noreturnval=0) at exec.c:5833
#28 0x0000555555599d25 in execshfunc (shf=0x55555588c490, args=0x7ffff7ff15f0) at exec.c:5402
#29 0x0000555555595f52 in execcmd_exec (state=0x7fffffffdaf0, eparams=0x7fffffffbce0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3968
#30 0x000055555558fe8d in execpline2 (state=0x7fffffffdaf0, pcode=2563, how=18, input=0, output=0, last1=0) at exec.c:1930
#31 0x000055555558ea30 in execpline (state=0x7fffffffdaf0, slcode=3074, how=18, last1=0) at exec.c:1660
#32 0x000055555558dcd3 in execlist (state=0x7fffffffdaf0, dont_change_job=1, exiting=0) at exec.c:1415
#33 0x00005555555c8546 in execif (state=0x7fffffffdaf0, do_exec=0) at loop.c:580
#34 0x0000555555595d34 in execcmd_exec (state=0x7fffffffdaf0, eparams=0x7fffffffc550, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3916
#35 0x000055555558fe8d in execpline2 (state=0x7fffffffdaf0, pcode=2435, how=18, input=0, output=0, last1=0) at exec.c:1930
#36 0x000055555558ea30 in execpline (state=0x7fffffffdaf0, slcode=25602, how=18, last1=0) at exec.c:1660
#37 0x000055555558dcd3 in execlist (state=0x7fffffffdaf0, dont_change_job=1, exiting=0) at exec.c:1415
#38 0x00005555555c7574 in execfor (state=0x7fffffffdaf0, do_exec=0) at loop.c:175
#39 0x0000555555595d34 in execcmd_exec (state=0x7fffffffdaf0, eparams=0x7fffffffce30, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3916
#40 0x000055555558fe8d in execpline2 (state=0x7fffffffdaf0, pcode=2371, how=18, input=0, output=0, last1=0) at exec.c:1930
#41 0x000055555558ea30 in execpline (state=0x7fffffffdaf0, slcode=33794, how=18, last1=0) at exec.c:1660
#42 0x000055555558dcd3 in execlist (state=0x7fffffffdaf0, dont_change_job=1, exiting=0) at exec.c:1415
#43 0x00005555555c7574 in execfor (state=0x7fffffffdaf0, do_exec=0) at loop.c:175
#44 0x0000555555595d34 in execcmd_exec (state=0x7fffffffdaf0, eparams=0x7fffffffd710, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3916
#45 0x000055555558fe8d in execpline2 (state=0x7fffffffdaf0, pcode=1859, how=18, input=0, output=0, last1=0) at exec.c:1930
#46 0x000055555558ea30 in execpline (state=0x7fffffffdaf0, slcode=102402, how=18, last1=0) at exec.c:1660
#47 0x000055555558dcd3 in execlist (state=0x7fffffffdaf0, dont_change_job=0, exiting=0) at exec.c:1415
#48 0x000055555558d30e in execode (p=0x7ffff7fefe98, dont_change_job=0, exiting=0, context=0x5555556290be "toplevel") at exec.c:1194
#49 0x00005555555b4698 in loop (toplevel=1, justonce=0) at init.c:212
#50 0x00005555555b8795 in zsh_main (argc=2, argv=0x7fffffffddd8) at init.c:1770
#51 0x000055555556b053 in main (argc=2, argv=0x7fffffffddd8) at ./main.c:109
Antoine
Messages sorted by:
Reverse Date,
Date,
Thread,
Author