[Bug 2756] New: sshd does not seem to terminate despite ClientAlive[Interval|CountMax] when a process is polling a remote forwarding channel

bugzilla-daemon at bugzilla.mindrot.org bugzilla-daemon at bugzilla.mindrot.org
Wed Aug 9 14:06:07 AEST 2017


https://bugzilla.mindrot.org/show_bug.cgi?id=2756

            Bug ID: 2756
           Summary: sshd does not seem to terminate despite
                    ClientAlive[Interval|CountMax] when a process is
                    polling a remote forwarding channel
           Product: Portable OpenSSH
           Version: 6.7p1
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: sshd
          Assignee: unassigned-bugs at mindrot.org
          Reporter: willchan at google.com

Hello,

The short summary of my situation is I have a mobile client that
establishes an ssh connection to a server, and uses remote port
forwarding to expose access to local services. On the server-side, a
monitoring service (a Prometheus instance we run) is polling via the
remote port. When the mobile WAN connection dies, the client attempts
to re-establish the ssh connection and the same remote port forwarding.
It fails with a "error: channel_setup_fwd_listener_tcpip: cannot listen
to port:". Our script keeps trying to reconnect every 15 seconds, but
it fails until approximately 15 minutes later.

I should note at this point that the client is running OpenSSH_7.2p2
and the server is running OpenSSH_6.7p1. Both are running Linux, albeit
different distros.

So, we thought we could handle this problem by setting
ClientAliveInterval and ClientAliveMaxCount in the server's
sshd_config. We set ClientAliveInterval to 10 and ClientAliveMaxCount
to 3. But it does not appear to solve the issue.

We dove in further, and have noted the following:
* It appears that the old sshd process that is listening on the remote
port is still alive, which explains the
channel_setup_fwd_listener_tcpip error.
* The old sshd process goes away after around 15~ minutes.
* The server's tcp_retries2 is set to 15 (the default)
* The monitoring service is polling every second
* The server has many TCP sockets to the remote port forward in
CLOSE_WAIT. I presume this is because the monitoring service is closing
its connection to the remote forwarding channel, but the sshd process
isn't closing its end of the connection, since the client hasn't closed
the channel.
* When we reduced tcp_retries2 to 8, the time for the sshd process to
exit reduced to about 2~ minutes.
* We also tried increasing our monitoring polling interval to 1 minute,
which seemed to reduce the recovery time to under a minute.

AFAICT, it seems to be the case that writing to the remote end of the
forwarding channel can interfere with the ClientAliveInterval. Take
this with many buckets of salt given I have never looked at the code
before, but I poked into briefly and it appears to be the case that in
the select call that uses the ClientAliveInterval as a timeout checks
both read and write file descriptors. I was looking specifically at
https://github.com/openssh/openssh-portable/blob/92e9fe633130376a95dd533df6e5e6a578c1e6b8/serverloop.c#L263.
IIUC, then if something is constantly writing (e.g. our monitoring
service) to the remote end of a channel, then the client_alive_check()
never gets called, even if the connection to the client is dead.

At this point, I figured I'd ask for help. Did I understand the code
correctly that client liveness is not checked if the remote end of a
forwarding channel receives data to forward onward to the client? If
not, can anyone else help explain the situation we're seeing? Or if I
managed to read the code correctly, can someone tell me if that's the
desired behavior for ClientAliveInterval, and if so, how I should be
configuring sshd to close the session when the client connection is
dead, even if the remote end of the forwarding channel is being written
to?

Thanks in advance, and apologies in advance if I've missed something
obvious or neglected to include important information.

-Will

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the openssh-bugs mailing list