ssh-agent: perform AGENTC_REMOVE_ALL_IDENTITIES on SIGUSR1

Fri Sep 10 22:06:10 AEST 2021

Hello Peter.

Well in the end i am only def and blind and need some time for
wrapping my head around a problem.  I think in fact that de
Raadt's initial comment pointed into the right direction, and that
requires nothing such, but the patch would be

  diff --git a/ssh-agent.1 b/ssh-agent.1
  index ed8c870969..dd2b0e8af1 100644
  --- a/ssh-agent.1
  +++ b/ssh-agent.1
  @@ -178,6 +178,9 @@ will automatically use them if present.
   is also used to remove keys from
   .Nm
   and to query the keys that are held in one.
  +.Nm
  +will remove all keys when it receives the user signal
  +.Dv SIGUSR1 .
   .Pp
   Connections to
   .Nm
  diff --git a/ssh-agent.c b/ssh-agent.c
  index 48a47d45a9..014cf7b38c 100644
  --- a/ssh-agent.c
  +++ b/ssh-agent.c
  @@ -171,6 +171,11 @@ static int fingerprint_hash = SSH_FP_HASH_DEFAULT;
   /* Refuse signing of non-SSH messages for web-origin FIDO keys */
   static int restrict_websafe = 1;

  +/* Request to remove_all_identities() pending */
  +static volatile sig_atomic_t remove_all_asap = 0;
  +
  +static enum run_state { RS_RUN, RS_POLL } run_state = RS_RUN;
  +
   static void
   close_socket(SocketEntry *e)
   {
  @@ -529,11 +534,10 @@ process_remove_identity(SocketEntry *e)
   }

   static void
  -process_remove_all_identities(SocketEntry *e)
  +remove_all_identities(void)
   {
   	Identity *id;

  -	debug2_f("entering");
   	/* Loop over all identities and clear the keys. */
   	for (id = TAILQ_FIRST(&idtab->idlist); id;
   	    id = TAILQ_FIRST(&idtab->idlist)) {
  @@ -543,6 +547,13 @@ process_remove_all_identities(SocketEntry *e)

   	/* Mark that there are no identities. */
   	idtab->nentries = 0;
  +}
  +
  +static void
  +process_remove_all_identities(SocketEntry *e)
  +{
  +	debug2_f("entering");
  +	remove_all_identities();

   	/* Send success. */
   	send_status(e, 1);
  @@ -1342,6 +1353,16 @@ cleanup_handler(int sig)
   	_exit(2);
   }

  +/*ARGSUSED*/
  +static void
  +remove_all_handler(int sig)
  +{
  +	if (run_state == RS_POLL)
  +		remove_all_identities();
  +	else
  +		remove_all_asap = 1;
  +}
  +
   static void
   check_parent_exists(void)
   {
  @@ -1619,6 +1640,7 @@ skip:
   	ssh_signal(SIGINT, (d_flag | D_flag) ? cleanup_handler : SIG_IGN);
   	ssh_signal(SIGHUP, cleanup_handler);
   	ssh_signal(SIGTERM, cleanup_handler);
  +	ssh_signal(SIGUSR1, remove_all_handler);

   	if (pledge("stdio rpath cpath unix id proc exec", NULL) == -1)
   		fatal("%s: pledge: %s", __progname, strerror(errno));
  @@ -1626,11 +1648,17 @@ skip:

   	while (1) {
   		prepare_poll(&pfd, &npfd, &timeout, maxfds);
  +		run_state = RS_POLL;
   		result = poll(pfd, npfd, timeout);
  +		run_state = RS_RUN;
   		saved_errno = errno;
   		if (parent_alive_interval != 0)
   			check_parent_exists();
  -		(void) reaper();	/* remove expired keys */
  +		if (remove_all_asap) {
  +			remove_all_asap = 0;
  +			remove_all_identities();
  +		} else
  +			(void) reaper();	/* remove expired keys */
   		if (result == -1) {
   			if (saved_errno == EINTR)
   				continue;

And if you really want to be super exact, one would add

  diff --git a/ssh-agent.c b/ssh-agent.c
  index 014cf7b38c..1ca8d3229e 100644
  --- a/ssh-agent.c
  +++ b/ssh-agent.c
  @@ -1647,6 +1647,10 @@ skip:
   	platform_pledge_agent();

   	while (1) {
  +		if (remove_all_asap) {
  +			remove_all_asap = 0;
  +			remove_all_identities();
  +		}
   		prepare_poll(&pfd, &npfd, &timeout, maxfds);
   		run_state = RS_POLL;
   		result = poll(pfd, npfd, timeout);

So with these two there is the security guarantee you were asking
for, that the removal happens instantly.  It is also portable.
Since all signals are blocked, all we need to know is whether we
are hanging on poll(2), in which case it is "safe" to remove from
within the signal handler in that no other code is currently
running in user space.  I'd say.  (Of course on Cygwin poll(2)
could be implemented like grazy, i have not looked in that for
i would say really twenty years, select(2) by then, but then again
i think they should block signals while working this tremendous
piece of code in something that people think are system calls.)

 |with a signal the sender can not know whether removal completed successf\
 |ully,
 |failed or is ongoing. They can merel hope for the best. That's a very weak
 |security promise.

But why so Peter?  Signals are used for communication on Unix
since ever?  If you send a signal to a process, and it has not
ignored the signal (if it can), the action behind happens.

You can see that easily on your Linux system, it kill aborts any
program you have rights for, just specify a realtime signal.

 |I for one would expect suspend to wait for successful removal and
 |that the suspend is cancelled with an error message should removal
 |fail, to know the state of agents.

What should fail when zeroing memory?  There is no status for this
operation anyhow, is it?  It is merely you calling ssh-add, which
then reports interaction with the agent was successful.

  ...
 |>|direct interaction with the agents look a lot better, with the
 |> 
 |> How do you do this, Mister, from an ACPI script running as root?
 |
 |for SSH_AUTH_SOCK in /tmp/ssh-*/agent.*; do ssh-add -D; done # maybe?

This is overly funny.

Done.  I always hate it to look in codebases that i do not know.
There is so much work and time in such a codebase, done by people
knowing the context exactly, and i just hate it.  I mean this is
effectively an easy thing here (but for me), but in general
someone who knows a codebase can implement something _right_
(avoiding pitfalls) in a very short time, a state that third
parties can achieve only with real effort.

So the patch above should possibly have been it from the start.
Maybe it will be considered, that would be nice.  (Just 's/^  //.)

Ciao from Germany,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)