Help with ssh -A, screen, ssh -a, detach, logout

Thu Jun 4 08:07:48 AEST 2020

raf wrote:
> I've noticed some ssh behaviour that I wish didn't
> happen. I was wondering if someone can explain how I
> can stop it from happening, or explain why it's
> unavoidable.

The problem you are running into is due to using a live network
connection from the remote system back to your local system's
ssh-agent.  If you weren't doing that then you would not run into this
problem.  If you want to continue to do this then you will need to
work around the problem.

> If I ssh-with-agent-forwarding from one host to a
> second host, and on the second host use something like

That's the live network connection that you are setting up.

> nohup/screen/tmux/daemon, and from within that new

And those are long running environments.  Meaning that if you break
the connection to the ssh-agent then it is possible to create
situations where things running in the the above environments want to
keep talking to the ssh-agent.

> process session, start a long-running command via
> ssh-without-agent-forwarding on a third host, I would
> expect to be able to (e.g.) detach from the screen
> session and log out of the second host, but my shell

In order to be able to expect to detach from the session and log out
then the environment must not keep any resources open.  And by
resources here it is the file descriptor that is connected to the
network socket that is connected to the ssh-agent.  That's open.
Meaning a non-zero reference count.  Meaning that it does not get
closed.  Meaning that ssh is going to keep the connection open.
"Because someone is using it."

> prompt on the first host doesn't come back and even
> Ctrl-C won't break the connection between ssh on the
> first host and sshd on the second host. I have to close
> the xterm window that the shell and ssh are running in.

You could also use "Enter ~ ." to forcibly close the connection too.
That is a useful command sequence.  The ~ is the escape character and
is recognized at the beginning of a line.  See the manual in the
section under "ESCAPE CHARACTERS" for the full description.

> If I don't do that, the shell prompt doesn't come back
> until the long-running command on the third host has
> completed.

Correct.  That is the correct behavior.  The long running command on
the remove is holding the file open.  Non-zero reference count.  When
the process exits then it closes the file.  Which closes the network
connection.  Which allows ssh to exit.

> To see what I mean:
> 
>   - on host1: Have ssh-agent running with an identity loaded
>   - on host1: "xterm &" (start an xterm on similar)

All good.

>   - on host1 in xterm: "ssh -A host2" (ssh-with-agent-forwarding to host2)

At this point my personal opinion is that we should pause and think
about why -A might be wanted here.  I realize the option exists.  I
realize that many people use this option a lot.  But personally I
almost never use that option.  I don't need to use that option.  That
option is just a door that opens to a room filled with a lot of
security layer related questions.  Which might be okay.  Or might be
a door I don't want to open.

Why are you using -A there?  Do you really need to use it?  That would
be a good discussion point.  Because I don't ever have a need for it.
Using it means one must trust the remote system not to be malicious.
(Which it mostly will not be.  But it is possible.)  But mostly
because the live network connection it sets up is then required to
stay available for the full lifecycle.  As you have found out.  It
creates entanglements.  It's messy.

>   - on host2: "screen" (start a screen session)

And this sets up a pitfall that might or might not be fallen into.  In
the screen environment for every shell started within it will be the
environment variables from the ssh connection.  You will probably see
something like this example.

  rwp at madness:~$ env | grep SSH
  SSH_AUTH_SOCK=/tmp/ssh-YsdgP0Eexk/agent.14641
  SSH_CONNECTION=192.168.230.119 44148 192.168.230.123 22
  SSH_CLIENT=192.168.230.119 44148 22
  SSH_TTY=/dev/pts/4

The problem is the SSH_AUTH_SOCK which is setting up the connectivity
to the ssh-agent on your originating client.  If you avoid that then
you avoid the problem.  I daresay the easiest way to avoid it is to
avoid the -A option.  But if you must use it then when setting up
screen you can remove it from the environment.

  env -u SSH_AUTH_SOCK screen

Here is am using 'env' to unset the variable from the environment.
And also 'env' is an idiom for a canonical way to set or clear
environment variables regardless of the command line shell that anyone
might be using.  Because bash, ksh, zsh, csh, and all of those have
slightly different syntax.  But invoking 'env' this way would be
identical in all of them.  Which makes it easiest for me to suggest
using env in this way and knowing that it will work regardless of the
user shell environment.  Isn't that pretty nice? :-)

>   - on host2 in screen: "ssh -a host3 sleep 60" (long-running cmd on host3)

And here you are using -a to *avoid* the ssh-agent that was set up
with the -A in the originating invocation.  Layers and layers!  If the
originating -A was removed then this -a could be removed.  Simplify!

>   - on host2 in screen: Ctrl-a d (detach from the screen session)

But it really can't!  Because of the live long running network
connection to the ssh-agent.  "The cake is a lie!"

>   - on host2: Ctrl-d (log out of host2)

This is not quite a "log out of host2".  This is an "exit the command
line shell running on host2".  The difference is important in this
specific case.  Because the command line shell will exit.  But the
command line shell is running under sshd on the remote host.  And that
is talking to the ssh on the local host.  And as described the remote
sshd is going to need to keep running waiting for the live network
connection to your ssh-agent to close.

>   - on host1: wait a long time for the shell prompt to appear or close xterm

Right.  As it should be doing.  Due to the use of -A.

> In other words, I want the agent to be forwarded to
> host2, so that I can then ssh from there to host3, but
> I don't want the agent to be forwarded to host3 because
> it's not needed there. Note that my real command was
> rsync so both host2 and host3 were involved.

That is the critical point.  I've written a bunch here already.  With
a lot of opinions embedded! :-)  So I will more or less pause for a
moment here for reflection.  Because everything becomes interconnected
and understanding those interconnections will make working with things
all make sense.  "Know how the system works.  Know how to work the
system." :-)

> My hypothesis is that agent forwarding has something to
> do with why the connection between host1 and host2
> isn't cleanly closed.

And I believe your hypothesis to be a correct one.

> Any suggestions?

I am missing some details of your environment and dependencies so this
is a potentially bad suggestion.  But if I absolutely needed to ssh
from host1 to host2 and then absolutely needed to use host2 as a
launching point to get to host3 and other places then I would create a
unique ssh key on host2 and start an ssh-agent running on host2 using
that key.  Then use that key to get to host3 and other hosts.

There is also a very convenient utility that I hesitate to mention
because it also opens a door to a room filled with security questions.
It might be fine.  It might be unacceptable.  Some will yell that they
hate my dog because I suggest this.  Others will go, well yes, I am
using it too.  Everything all depends.

I would run 'keychain' on host2 so that everytime you log into host2
it reattaches your command line shell environment to an ssh-agent
running on host2.  Since it seems like you are really using host2 as
your main home base of operations.  Maybe your originating client is
your mobile laptop or something.  That's fine.  You want to be able to
suspend your laptop and then move to another WiFi network and resume
and then reconnect.  Maybe.  That's all fine.  I do that routinely.
And if host2 is your main base of operations then it would be where I
would be running the main ssh-agent that is used to log into the other
hosts.  Or maybe it is just a local base of operations for a single
computer cluster of compute farm machines all being administered
together.  Same thing.

You can read about keychain on the upstream docs.  If you are not
running Funtoo ignore all of the Funtoo references as keychain has
almost certainly been packaged for your OS software distribution.  On
my system "apt-get install keychain" installs it.  It's really just
one #!/bin/sh shell script.  Very portable.  Even if it is not
packaged for your OS you can almost certainly use a copy from your
home as it is simply a shell script.

  https://www.funtoo.org/Keychain

Then in my .profile I have this code to set it up.  This would go on
host2's ~/.profile.  Or ~/.bash_profile if that is what you are using.
Or ~/.zlogin or whatever.  Make sure you know your shell's start up
environment file and do the right thing.  This is for bash or ksh.

  if (keychain --version) >/dev/null 2>&1; then
    keychain -q
    if [ -f $HOME/.keychain/$(hostname)-sh ]; then
      . $HOME/.keychain/$(hostname)-sh
    fi
  fi

A newly started ssh-agent on host2 would need ssh-add to be run at
least once in order to load your ssh keys into the running agent.  But
then it will continue to run there even after you log out.  However if
you flush the keys from the agent or host2 reboots or whatever then
you would need to ssh-add again after that point in order to load up
ssh keys in that agent.  Also you can ssh-add -D to delete identities
at any time to prevent it's further use until you add keys back into it.

As I read between the lines I think this would be a good solution for
you.  However that does make some assumptions.  I am seeing lines and
trying to interpolate between them.  I am suggesting this in order to
be helpful.  But please understand the issues and then make your own
decisions.

Hope this helps! :-)

Bob