Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: BUG: Zsh loses history entries since 2015



Hello again

Thanks everyone for your replies.
I’ll respond to all open questions I am aware of:

> > Why? What does it even mean to interrupt the loading of history at
> > shell startup?
> > Why would I ever want a shell process to only partially load my history?
>
> If shell startup seems to be especially slow, being able to interrupt it
> with Ctrl-C and get a basic shell prompt is useful. NFS and temporary
> network issues can be the cause. If I just want to get in to fix a
> problem a degraded but working shell is much better than having it hang
> indefinitely.

OK. I am intentionally putting .zsh_history on local storage,
but I understand that (e.g.) universities like to use homedirs-on-NFS.

I think it’s key in such situations that the shell prints a message.

> > Do you mean just when exiting (that seems okay) or also on startup?
> > Applying this behavior at startup would mean that users could run a shell
> > that just silently does not save to history, right? That does not sound good.
>
> Not good indeed but losing one session's new commands may be preferable
> to losing part of the existing history file. Perhaps the shell can print
> something to warn that history saving is disabled along with the command
> to ignore and reenable it.

Yeah, I think the shell definitely has to print something.

Losing new commands may be better than losing old commands (or not!),
but fundamentally I think we should strive to never lose any commands,
at least not without alerting the user loudly first.

> > Yes. During my work day, I open many SSH connections from my laptop
> > to my workstation. At the end of the work day, I kill the connection,
> > which means all those zsh processes will be killed.
>
> Given this, are you sure it's a SIGINT that's being received?
>
> Typically in cases of a connection shutdown the signal would be
> SIGHUP, potentially followed by a SIGKILL if it's a local terminal in
> the window/session manager rather than a remote one via SSH.

You’re right, I checked: SIGHUP gets sent when exiting.

However, in the core dump I have from just before I sent my report
to this zsh mailing list, I can see that the last_signal variable
has value 2 (= SIGINT), so there must be a SIGINT involved here.

I don’t fully understand which signals get sent when and cause what.
When I find some time I’ll see if I can obtain more logs on that somehow.

One theory that just came to my mind:
When exiting, I quit my terminals using a mixture of Ctrl+C/Ctrl+D.
Perhaps a Ctrl+D causes zsh to exit and the Ctrl+C is actually sent
by myself! I’ll need to take a closer look the next time I reproduce it.

> > To be clear: the zsh processes that are killed in my scenario do not
> > have any pending history to save
>
> How do you know this?  Because if this could be programmatically

1. I can observe this from the bpftrace logs I collect:
   non-interrupted zsh exits read + write the same number of history bytes.
2. I use INC_APPEND_HISTORY and do not leave half-entered commands
   in my various shell sessions. Hence, I know there is no pending history.

> determined at time of exit, the entire history operation could be
> optimized away.

I don’t think it can, unless we’re okay with skipping the size and duplicate
enforcement, which only happens at exit time?

> > On closer look I am reminded that readhistfile() is called from within
> > savehistfile(), so this should be clarified as specifically "happens
> > during write-back".
>
> I think there's an opportunity to do a little more error-handling
> specifically when readhistfile() has failed during savehistfile().
> See patch.

Thanks, I can confirm that your patch fixes the issue.

> When savehistfile() is doing straightforward write-back, it only stops
> if fprintf() or fputc() returns -1.  It feels like there's a potential
> race condition there, but it'd require some queue_signals fiddling or
> similar to resolve (or at least reduce) it.  Look for the
> extended_history condition + fprintf() to see where I mean.  I haven't
> attempted any changes with that, but a SIGHUP here certainly appears
> as if it would leave a truncated history file if HIST_SAVE_BY_COPY is
> not set, and a SIGINT probably so.

You mean receiving a SIGHUP while in fprintf() or fputc()?

The POSIX standard says that fputc can return EINTR on signal:
https://pubs.opengroup.org/onlinepubs/9690949599/functions/fputc.html

And the glibc says say that it’s up to the application to handle EINTR:
https://www.gnu.org/software/libc/manual/html_node/Interrupted-Primitives.html

So, should we handle EINTR?

> I further suspect there's a logic error at the end of the
> readhistline() loop in readhistfile(), proposed fix in the patch.  I
> don't believe that would change about the cases that were already
> being caught.

Which logic error do you mean specifically? (It’s not clear from the patch.)

> Another thing I haven't addressed (and am not sure how to) -- if the
> history operation is stopped (TSTP or STOP signal) then when resumed
> its file positions may no longer have any relation to the actual
> files.  I suppose in theory the fact that it would lockhistfile()
> before being stopped should mean that no other changes can apply
> before it gets restarted?

Yes, I think relying on the existing lockhistfile() should prevent
issues around SIGSTOP/SIGCONT.
I also noticed that when pressing Ctrl+Z while reading/writing history,
the shell does not actually get suspended.

Best regards
Michael




Messages sorted by: Reverse Date, Date, Thread, Author