ssh client does not timeout if the network fails after ssh_connect but before ssh_exchange_identification, even with Alive options set

Wed Jul 25 08:33:09 EST 2007

Hello,

I am testing ssh with occasional network disconnection between server and
client during these days. I found ssh sometimes hangs if the disconnection
happens after the connection is established but before
ssh_exchange_identification completes. The ssh configuration files show that
both client and server alive options are set.
In /etc/ssh/ssh_config:
# Send keepalive messages to the server. Disconnect after 90 seconds.
  ServerAliveInterval 30
  ServerAliveCountMax 3
In /etc/ssh/sshd_config:
# ClientAlive is more flexible and secure than TCPKeepAlive. (ssh2)
# Send an alive messages every 30 seconds, and disconnect after 90 seconds.
ClientAliveInterval 30
ClientAliveCountMax 3

The ssh client kept hanging even after the network was resumed. It finally
timed out after about 2 hours because the tcp_keepalive_time is set as 2
hours in sysctl.
I looked at the ssh code downloaded from your website and found the Alive
options are only used to setup timeout after ssh_session starts. So my
question is why we do not start monitoring the liveness of ssh server right
after a connection is established. It is annoying when an application relies
on ssh to do periodic work but an occasional network failure causes the
application to miss several service circles due to ssh hanging.

Thanks a lot!

Jiaying