[Bug 2265] New: ServerAlive{Interval,CountMax} ignored if using an active -R or -L tunnel

Tue Aug 26 07:34:27 EST 2014

https://bugzilla.mindrot.org/show_bug.cgi?id=2265

            Bug ID: 2265
           Summary: ServerAlive{Interval,CountMax} ignored if using an
                    active -R or -L tunnel
           Product: Portable OpenSSH
           Version: -current
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: ssh
          Assignee: unassigned-bugs at mindrot.org
          Reporter: openssh at orib.net

Scenario:

1. Set up a local socket server that sends data slowly enough so that
buffers would take hours to fill up:

  $ (until false; do echo -n X; sleep 2; done) | nc -l 8000 &

2. Connect through an unreliable connection, asking to detect a broken
connection within 10 seconds (5 second "alive" signals, 2 missing
maximum)

  $ ssh -R 8001:127.0.0.1:8000 \
        -o 'ServerAliveInterval 5' -o 'ServerAliveCountMax 2' \
        -o 'ProxyCommand nc 127.0.0.1 22' \
        127.0.0.1 'telnet 127.0.0.1 8001'

(this assumes you can ssh into localhost using either a password or
public key authentication)

3. Observe that indeed, you are getting 'X' printed every 2 seconds,
through the ssh tunnel.

4. Suspend the intermediate proxy - in another terminal / screen
session (or after backgrounding the ssh command above), do:

   $ pkill -STOP -xf 'nc 127.0.0.1 22'

5. Wait 10 seconds for ServerAlive detection to kick in. Or 10 hours.
ServerAlive detection never actually kicks in.

6. Tear down everything (it is enough to Ctrl-C the ssh command)

7. Repeat steps 1-5, this time, with 'sleep 2' replaced by 'sleep 30'.
This time, ServerAlive detection kicks in as expected.

This happens on every openssh version I've tried (All on linux, the
versions on ubuntu 8.04, 10.04, 10.10, 12.04, 14.04), and is still in
current from browsing the source code.

The problem is the "ServerAlive" logic (and I assume, also the
ClientAlive logic on the server side - though I haven't verified that
yet): A connection is deemed "alive" if the select() waiting for data
did not time out. 

However, it should be deemed alive only if there has been data on the
ssh connection itself - not the local ends of a -L / -R tunnel and
whatever other local sockets might be waited upon with select(). 
As the above example shows, even though the connection to the server is
effectively dead, it will not be detected.

This setup is artificial, and is easier to debug than a real world
setting. It includes:

- the ssh server
- an intermediate pipe ('nc 127.0.0.1 22') that can be kill -STOPped
without dropping the connection
- the ssh client
- a slow server that trickles data through a tunnel

In a real world scenario, the intermediate pipe is likely to be an
unreliable network connection (e.g. an intermediate router somewhere
along the way that is not directly connected to a client interface -
and that stops routing traffic in the middle of the session). If this
is the case, then eventually the ssh client will have a TCP timeout (2
mins, usually) and detect the broken connection -- which is why I
suppose this was not previously reported. However, if there is no
indication the intermediate connection died (like in the example I gave
above), then the ssh client will hang forever, despite the
"ServerAlive*" settings.

As I mentioned, this likely applies to the sshd, ClientAliveInterval,
ClientAliveCountMax respectively, though I haven't verified it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.