ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1

Sun Sep 12 02:47:35 AEST 2021

Hi Steffen,

Steffen Nurpmeso wrote:
> So with these two there is the security guarantee you were asking
> for, that the removal happens instantly.

Sorry but no, as long as the trigger is a signal there's no guarantee.
See below.

> It is also portable.

Nice!

> |with a signal the sender can not know whether removal completed
> |successfully, failed or is ongoing. They can merel hope for the best.
> |That's a very weak security promise.
> 
> But why so Peter?  Signals are used for communication on Unix since ever?

In your stated use case of suspend on laptop lid close there is a race
condition between the actual suspend and SIGUSR1 being handled in agents.

The ACPI script runs various commands to prepare for suspend, one of
them would be pkill or killall. That command would find all agent
processes and call kill(2) to send SIGUSR1 to each one.

There's a race because signal handling is asynchronous.

kill(2) returns as soon as the signal has been *generated*, not when the
signal handler in the receiving process has completed, in any case quite
possibly before the signal handler has even started.

After generating signals the pkill/killall command exits and from there
execution of the suspend script (which continues towards suspending) and
execution of signal handlers in agents (clearing keys) are decoupled
without any way to synchronize - the suspend script can't get feedback
from the agents.

Consider as an exaggerated example the pkill/killall command being the
very last command in the ACPI script before echo mem > /sys/power/state
and a system with two CPU threads and 100+ running agents.

I wouldn't be surprised if the kernel will have suspended the machine
before all 100+ signal handlers have finished clearing keys. Maybe some
or many kernels will reliably deliver all pending signals before suspending,
but that doesn't mean much beyond scheduling the signal handlers, right?

In particular I don't expect any kernel to wait for all processes to be
idle or otherwise wait for signal handlers to return.

I also wouldn't be surprised if the kernel would suspend the machine
at least sometimes before keys were removed with just one agent running.

That may be less likely, but the point is that with a signal there's no
way to know *for sure* that signal handlers have completed before suspend.

> ignored the signal (if it can), the action behind happens.

Yes. The signal handler will run at some point, but maybe not until
after resume, leaving keys in the agent while the machine is suspended,
ie. silently failing to do what was intended.

>  |I for one would expect suspend to wait for successful removal and
>  |that the suspend is cancelled with an error message should removal
>  |fail, to know the state of agents.
> 
> What should fail when zeroing memory?

Zeroing can't fail - later patch iterations have improved in this regard!

In earlier iterations at least socket() and connect() could have failed.

> There is no status for this operation anyhow, is it?

There is; anything but successful exit of ssh-add -D indicates that the
intended operation was not (or not yet) completed as intended.

The operation is logically simple, but technically complex enough that
there can be many reasons for failure. Rather than trying to explicitly
check them all I like that if ssh-add -D exits with success then, and
only then, I do know that the intended operation has completed.

>  |>|direct interaction with the agents look a lot better, with the
>  |> 
>  |> How do you do this, Mister, from an ACPI script running as root?
>  |
>  |for SSH_AUTH_SOCK in /tmp/ssh-*/agent.*; do ssh-add -D; done # maybe?
> 
> This is overly funny.

Why? Did you test it and/or spot a problem? It does work for me.

If this is run in the suspend script then there's no race condition.

With added error checking of ssh-add exit status the machine could
be made to not suspend unless *after* all keys had been removed.

>  |>|significant additional advantage of needing zero new agent code.
>  |> 
>  |> Oh nooo!  Spoiler!!!
>  |
>  |When in doubt, keep the trusted codebase small.
> 
> Done.

I disagree; every addition grows the codebase.

> I always hate it to look in codebases that i do not know.

I can understand that!

I see that a bit differently: I find it interesting to look in code that
I don't know, because I may learn from it.

You're right that it requires effort, but so does working with my own code. :)

Kind regards

//Peter