ssh client does not timeout if the network fails after ssh_connect but before ssh_exchange_identification, even with Alive options set
Jiaying Zhang
jiayingz at google.com
Thu Jul 26 08:12:24 EST 2007
Hello again,
Here is the patch I came up with to prevent the hanging in
ssh_exchange_identification. I tested it a little bit and it seems to have
solved the problem. Could anyone help to have a look at the patch? Thanks a
lot!
--- sshconnect.c~old 2007-07-25 10:44:26.000000000 -0700
+++ sshconnect.c 2007-07-25 14:45:57.000000000 -0700
@@ -404,9 +404,26 @@ ssh_exchange_identification(void)
int minor1 = PROTOCOL_MINOR_1;
u_int i, n;
+ if (options.server_alive_interval) {
+ fd_set rfds;
+ struct timeval timeo = { .tv_usec=0 };
+ int read_timeouts, ret;
+
+ FD_SET(connection_in, &rfds);
+ for (read_timeouts = 0;;) {
+ timeo.tv_sec = options.server_alive_interval;
+ ret = select(connection_in+1, &rfds, NULL, NULL,
&timeo);
+ if (ret < 0) {
+ fatal("ssh_exchange_identification: select
read error: %.100s", strerror(errno));
+ } else if (ret == 0) {
+ if (++read_timeouts >=
options.server_alive_count_max)
+ fatal("ssh_exchange_identification:
Timeout, server not responding");
+ } else
+ break;
+ }
+
+ }
/* Read other side's version identification. */
- struct timeval timeo = { .tv_sec=10, .tv_usec=0 };
- setsockopt(connection_in, SOL_SOCKET, SO_SNDTIMEO, &timeo,
sizeof(timeo));
for (n = 0;;) {
for (i = 0; i < sizeof(buf) - 1; i++) {
size_t len = atomicio(read, connection_in, &buf[i],
1);
@@ -490,6 +507,25 @@ ssh_exchange_identification(void)
compat20 ? PROTOCOL_MAJOR_2 : PROTOCOL_MAJOR_1,
compat20 ? PROTOCOL_MINOR_2 : minor1,
SSH_VERSION);
+ if (options.server_alive_interval) {
+ fd_set wfds;
+ struct timeval timeo = { .tv_usec=0 };
+ int write_timeouts, ret;
+
+ FD_SET(connection_out, &wfds);
+ for (write_timeouts = 0;;) {
+ timeo.tv_sec = options.server_alive_interval;
+ ret = select(connection_out+1, NULL, &wfds, NULL,
&timeo);
+ if (ret < 0) {
+ fatal("ssh_exchange_identification: select
write error: %.100s", strerror(errno));
+ } else if (ret == 0) {
+ if (++write_timeouts >=
options.server_alive_count_max)
+ fatal("ssh_exchange_identification:
Timeout, server not responding");
+ } else
+ break;
+ }
+
+ }
if (atomicio(vwrite, connection_out, buf, strlen(buf)) !=
strlen(buf))
fatal("write: %.100s", strerror(errno));
client_version_string = xstrdup(buf);
Jiaying
On 7/24/07, Jiaying Zhang <jiayingz at google.com> wrote:
>
> Hello,
>
> I am testing ssh with occasional network disconnection between server and
> client during these days. I found ssh sometimes hangs if the disconnection
> happens after the connection is established but before
> ssh_exchange_identification completes. The ssh configuration files show that
> both client and server alive options are set.
> In /etc/ssh/ssh_config:
> # Send keepalive messages to the server. Disconnect after 90 seconds.
> ServerAliveInterval 30
> ServerAliveCountMax 3
> In /etc/ssh/sshd_config:
> # ClientAlive is more flexible and secure than TCPKeepAlive. (ssh2)
> # Send an alive messages every 30 seconds, and disconnect after 90
> seconds.
> ClientAliveInterval 30
> ClientAliveCountMax 3
>
> The ssh client kept hanging even after the network was resumed. It finally
> timed out after about 2 hours because the tcp_keepalive_time is set as 2
> hours in sysctl.
> I looked at the ssh code downloaded from your website and found the Alive
> options are only used to setup timeout after ssh_session starts. So my
> question is why we do not start monitoring the liveness of ssh server right
> after a connection is established. It is annoying when an application relies
> on ssh to do periodic work but an occasional network failure causes the
> application to miss several service circles due to ssh hanging.
>
> Thanks a lot!
>
> Jiaying
>
>
More information about the openssh-unix-dev
mailing list