question on scalability

Damien Miller djm at mindrot.org
Sat Nov 22 10:15:02 EST 2003


On Sat, 2003-11-22 at 04:20, Andrey Ermolinskiy wrote:
> Hello All,
> 
> We have a Linux cluster application that uses openssh as its inter-node
> communication mechanism and we've recently run into a problem that points
> to a potential scalability issue in openssh code.
> 
> Our client nodes systematically open ssh connections to the server node to
> execute an administrative command. When establishing socket connections,
> the server side sometimes fails to complete the TCP handshake with some of
> the clients.  The final ACK coming from the client node would sometimes be
> dropped by server-side TCP, and the corresponding connection would never be
> added to sshd's accept queue. This leaves the ssh client command in a hung
> state, as it has completed its part of the TCP handshake and is ready to
> exchange data over the socket.

This sounds like a TCP problem, not a ssh problem. If the ACK is dropped
by the server end, then the client should just resend?

> This problem reveals itself in situations where 64 or more client nodes
> issue concurrent ssh requests to the server.
> 
> Looking at sshd.c, I noticed that the daemon's listen socket is created
> with a very short backlog value (5), and we are certain that this is the
> cause of our problem. Is there a reason for using such a small value, as
> opposed to setting the backlog to SOMAXCONN?

I'm not sure why the backlog is set low, perhaps to offer some
mitigation for connection flooding DoS attacks. Markus?

-d





More information about the openssh-unix-dev mailing list