ssh client does not timeout if the network fails after ssh_connect but before ssh_exchange_identification, even with Alive options set

Dan Yefimov dan at ns15.lightwave.net.ru
Fri Jul 27 08:13:38 EST 2007


On Thu, 26 Jul 2007, Jiaying Zhang wrote:

> Hi Dan,
> 
> Thanks for looking at the patch. I changed the code as you suggested. The
> new patch is attached below. I have a little concern about restarting
> select() in case of EINTR, because it can postpone timeout.

Yes, it can, but there is nothing to do with that (except for at least Linux, 
which always returns remaining timeout in the select() timeout parameter). 
Fortunately signals are relatively rare, so the timeout deviation is IMHO not 
essential. There are also other possible reasons for deviation, for example
sending the packet to another end while network is down.

> It may cause
> some problem if people schedule their jobs based on the expected ssh
> timeout.

I would rather believe people take some actions on ssh returning failure,
rather than schedule actions after exact timeout intervals.

> I also noticed that in server_alive_check() function, we check
> if (++server_alive_timeouts > options.server_alive_count_max). Should it be
> >=? According to the comment in ssh_config, if ServerAliveInterval equals to
> 30 and ServerAliveCountMax equals to 3, ssh disconnects after 90 seconds.
> But with the current server_alive_check, we actually disconnects after 120
> seconds.
> 
Not right. Pay attention at '++server_alive_timeouts > 
options.server_alive_count_max' expression. server_alive_timeouts is 
pre-incremented there, that is it's incremented value is used while evaluating 
the expression, so the count begins with 1 rather than 0 there.

Some notices I (unfortunately) forgot to meantion previously:

> diff -puN sshconnect.c~old sshconnect.c
> --- sshconnect.c~old    2007-07-25 16:59:36.000000000 -0700
> +++ sshconnect.c        2007-07-26 11:46:48.000000000 -0700
> @@ -404,6 +404,28 @@ ssh_exchange_identification(void)
>         int minor1 = PROTOCOL_MINOR_1;
>         u_int i, n;
> 
> +       if (options.server_alive_interval) {
> +               fd_set rfds;
> +               struct timeval timeo;
> +               int read_timeouts, ret;
> +
> +               FD_SET(connection_in, &rfds);
		  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(1) This should be placed inside the cycle before select() since on error (even 
EINTR) rfds along with timeo becomes undefined according to spec.

> +               for (read_timeouts = 0;;) {
> +                       timeo.tv_sec = options.server_alive_interval;
> +                       timeo.tv_usec = 0;
> +                       ret = select(connection_in+1, &rfds, NULL, NULL, &timeo);
> +                       if (ret < 0) {
> +                               if (errno == EINTR)
> +                                       continue;
> +                               fatal("ssh_exchange_identification: select read error: %.100s", strerror(errno));
> +                       } else if (ret == 0) {
> +                               if (++read_timeouts >= options.server_alive_count_max)
				      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(2) Replace '>=' with '>' here because of discussion above.

> +                                       fatal("ssh_exchange_identification: Timeout, server not responding");
> +                       } else
> +                               break;
> +               }
> +
> +       }
>         /* Read other side's version identification. */
>         for (n = 0;;) {
>                 for (i = 0; i < sizeof(buf) - 1; i++) {
> @@ -488,6 +510,27 @@ ssh_exchange_identification(void)
>             compat20 ? PROTOCOL_MAJOR_2 : PROTOCOL_MAJOR_1,
>             compat20 ? PROTOCOL_MINOR_2 : minor1,
>             SSH_VERSION);
> +       if (options.server_alive_interval) {
> +               fd_set wfds;
> +               struct timeval timeo = { .tv_usec=0 };
> +               int write_timeouts, ret;
> +
> +               FD_SET(connection_out, &wfds);
		  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The same notice here as notice (1) above.

> +               for (write_timeouts = 0;;) {
> +                       timeo.tv_sec = options.server_alive_interval;
> +                       ret = select(connection_out+1, NULL, &wfds, NULL, &timeo);
> +                       if (ret < 0) {
> +                               if (errno == EINTR)
> +                                       continue;
> +                               fatal("ssh_exchange_identification: select write error: %.100s", strerror(errno));
> +                       } else if (ret == 0) {
> +                               if (++write_timeouts >= options.server_alive_count_max)
				      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The same notice here as notice (2) above.

> +                                       fatal("ssh_exchange_identification: Timeout, server not responding");
> +                       } else
> +                               break;
> +               }
> +
> +       }
>         if (atomicio(vwrite, connection_out, buf, strlen(buf)) != strlen(buf))
>                 fatal("write: %.100s", strerror(errno));
>         client_version_string = xstrdup(buf);
> 

-- 

    Sincerely Your, Dan.



More information about the openssh-unix-dev mailing list