Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: BUG: Zsh loses history entries since 2015
- X-seq: zsh-workers 53488
- From: Michael Stapelberg <stapelberg+zsh@xxxxxxxxxx>
- To: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
- Cc: zsh-workers@xxxxxxx, opk@xxxxxxx
- Subject: Re: BUG: Zsh loses history entries since 2015
- Date: Tue, 15 Apr 2025 11:58:26 +0200
- Archived-at: <https://zsh.org/workers/53488>
- In-reply-to: <CAH+w=7bErAhPK9325uXP4r19So_vUPUMoCVJnNF3CPz_jRiH=w@mail.gmail.com>
- List-id: <zsh-workers.zsh.org>
- References: <CAH9Oa-bfWAo2OLSaYYK+udd=+kg_7=712P+NudVvW-rR9Buagg@mail.gmail.com> <CAH+w=7Z9te+S5rUJRMrHChb8orHEkOWDfUQjzO2C30o_SFOkyg@mail.gmail.com> <CAH9Oa-Z-HtdHC+S_0FtDW9t8sdEJSdbPgdwV2YDuBZEgjYzCzA@mail.gmail.com> <CAH+w=7bp+YjZCkVqtingKkcXyDbEus9B53-vgAej_r_MLVn1EA@mail.gmail.com> <CAH+w=7bErAhPK9325uXP4r19So_vUPUMoCVJnNF3CPz_jRiH=w@mail.gmail.com>
Hello again
Thanks everyone for your replies.
I’ll respond to all open questions I am aware of:
> > Why? What does it even mean to interrupt the loading of history at
> > shell startup?
> > Why would I ever want a shell process to only partially load my history?
>
> If shell startup seems to be especially slow, being able to interrupt it
> with Ctrl-C and get a basic shell prompt is useful. NFS and temporary
> network issues can be the cause. If I just want to get in to fix a
> problem a degraded but working shell is much better than having it hang
> indefinitely.
OK. I am intentionally putting .zsh_history on local storage,
but I understand that (e.g.) universities like to use homedirs-on-NFS.
I think it’s key in such situations that the shell prints a message.
> > Do you mean just when exiting (that seems okay) or also on startup?
> > Applying this behavior at startup would mean that users could run a shell
> > that just silently does not save to history, right? That does not sound good.
>
> Not good indeed but losing one session's new commands may be preferable
> to losing part of the existing history file. Perhaps the shell can print
> something to warn that history saving is disabled along with the command
> to ignore and reenable it.
Yeah, I think the shell definitely has to print something.
Losing new commands may be better than losing old commands (or not!),
but fundamentally I think we should strive to never lose any commands,
at least not without alerting the user loudly first.
> > Yes. During my work day, I open many SSH connections from my laptop
> > to my workstation. At the end of the work day, I kill the connection,
> > which means all those zsh processes will be killed.
>
> Given this, are you sure it's a SIGINT that's being received?
>
> Typically in cases of a connection shutdown the signal would be
> SIGHUP, potentially followed by a SIGKILL if it's a local terminal in
> the window/session manager rather than a remote one via SSH.
You’re right, I checked: SIGHUP gets sent when exiting.
However, in the core dump I have from just before I sent my report
to this zsh mailing list, I can see that the last_signal variable
has value 2 (= SIGINT), so there must be a SIGINT involved here.
I don’t fully understand which signals get sent when and cause what.
When I find some time I’ll see if I can obtain more logs on that somehow.
One theory that just came to my mind:
When exiting, I quit my terminals using a mixture of Ctrl+C/Ctrl+D.
Perhaps a Ctrl+D causes zsh to exit and the Ctrl+C is actually sent
by myself! I’ll need to take a closer look the next time I reproduce it.
> > To be clear: the zsh processes that are killed in my scenario do not
> > have any pending history to save
>
> How do you know this? Because if this could be programmatically
1. I can observe this from the bpftrace logs I collect:
non-interrupted zsh exits read + write the same number of history bytes.
2. I use INC_APPEND_HISTORY and do not leave half-entered commands
in my various shell sessions. Hence, I know there is no pending history.
> determined at time of exit, the entire history operation could be
> optimized away.
I don’t think it can, unless we’re okay with skipping the size and duplicate
enforcement, which only happens at exit time?
> > On closer look I am reminded that readhistfile() is called from within
> > savehistfile(), so this should be clarified as specifically "happens
> > during write-back".
>
> I think there's an opportunity to do a little more error-handling
> specifically when readhistfile() has failed during savehistfile().
> See patch.
Thanks, I can confirm that your patch fixes the issue.
> When savehistfile() is doing straightforward write-back, it only stops
> if fprintf() or fputc() returns -1. It feels like there's a potential
> race condition there, but it'd require some queue_signals fiddling or
> similar to resolve (or at least reduce) it. Look for the
> extended_history condition + fprintf() to see where I mean. I haven't
> attempted any changes with that, but a SIGHUP here certainly appears
> as if it would leave a truncated history file if HIST_SAVE_BY_COPY is
> not set, and a SIGINT probably so.
You mean receiving a SIGHUP while in fprintf() or fputc()?
The POSIX standard says that fputc can return EINTR on signal:
https://pubs.opengroup.org/onlinepubs/9690949599/functions/fputc.html
And the glibc says say that it’s up to the application to handle EINTR:
https://www.gnu.org/software/libc/manual/html_node/Interrupted-Primitives.html
So, should we handle EINTR?
> I further suspect there's a logic error at the end of the
> readhistline() loop in readhistfile(), proposed fix in the patch. I
> don't believe that would change about the cases that were already
> being caught.
Which logic error do you mean specifically? (It’s not clear from the patch.)
> Another thing I haven't addressed (and am not sure how to) -- if the
> history operation is stopped (TSTP or STOP signal) then when resumed
> its file positions may no longer have any relation to the actual
> files. I suppose in theory the fact that it would lockhistfile()
> before being stopped should mean that no other changes can apply
> before it gets restarted?
Yes, I think relying on the existing lockhistfile() should prevent
issues around SIGSTOP/SIGCONT.
I also noticed that when pressing Ctrl+Z while reading/writing history,
the shell does not actually get suspended.
Best regards
Michael
Messages sorted by:
Reverse Date,
Date,
Thread,
Author