question on scalability

Damien Miller djm at
Sat Nov 22 10:15:02 EST 2003

On Sat, 2003-11-22 at 04:20, Andrey Ermolinskiy wrote:
> Hello All,
> We have a Linux cluster application that uses openssh as its inter-node
> communication mechanism and we've recently run into a problem that points
> to a potential scalability issue in openssh code.
> Our client nodes systematically open ssh connections to the server node to
> execute an administrative command. When establishing socket connections,
> the server side sometimes fails to complete the TCP handshake with some of
> the clients.  The final ACK coming from the client node would sometimes be
> dropped by server-side TCP, and the corresponding connection would never be
> added to sshd's accept queue. This leaves the ssh client command in a hung
> state, as it has completed its part of the TCP handshake and is ready to
> exchange data over the socket.

This sounds like a TCP problem, not a ssh problem. If the ACK is dropped
by the server end, then the client should just resend?

> This problem reveals itself in situations where 64 or more client nodes
> issue concurrent ssh requests to the server.
> Looking at sshd.c, I noticed that the daemon's listen socket is created
> with a very short backlog value (5), and we are certain that this is the
> cause of our problem. Is there a reason for using such a small value, as
> opposed to setting the backlog to SOMAXCONN?

I'm not sure why the backlog is set low, perhaps to offer some
mitigation for connection flooding DoS attacks. Markus?


More information about the openssh-unix-dev mailing list