Issues with ssh-agent connecting to a large number of hosts at once

Bob Proulx bob at proulx.com
Sat Apr 18 11:48:21 EST 2009


Bob Belnap wrote:
> It seems to me that I'm hitting some kind of kernel limit (open file limit
> perhaps?)  But I've fiddled with every sysctl value I can find, and haven't
> found the right magic. Anyone run into this or can offer further debugging
> suggestions?  (btw, ssh-v shows: OpenSSH_5.1p1 Debian-3ubuntu1, OpenSSL
> 0.9.8g)

I don't have a perfect understanding of this but not seeing anyone
else say anything I will jump in and make some suggestions imperfect
though they will be.  Different types of kernels will handle this
differently and will account for why different systems behave
differently.  But most have a limited amount of memory available for
network resources.  Quickly opening and closing network connections
can cause memory to be consumed at a high right.  Once the available
memory is exceeded system calls fail for being out of resources until
more resources are available.  This is what you are seeing.

Why do resources become consumed?  Look at RFC793 and you will find
the TCP state diagram.  Look particularly at the TIME_WAIT state.  You
are probably creating many connections hanging around in the TIME_WAIT
state after they are closed and until the timeout.  Each of those
consumes network memory.  You can see these connections by looking at
the state reported by netstat.  (e.g. 'netstat | grep TIME_WAIT') If
you see many connections in the TIME_WAIT state then this is what you
are running into.  In many kernels with a limited amount of network
resources this limits the rate at which connections may be created and
closed.

I am not familiar with TakTuk but it appears to try to avoid this
problem by spreading the load around.  That is good.  But perhaps you
are still exceeding the system limits.  It appears to me that you are.

This isn't really particular to ssh but is generic to anything that
creates TCP connections.  Since ssh uses TCP it has the same
limitation as any other program that uses TCP and leaves connections
in the TIME_WAIT state until they timeout and their resources are
reclaimed.

Hope that helps.

Bob


More information about the openssh-unix-dev mailing list