Issues with ssh-agent connecting to a large number of hosts at once

Bob Belnap bbelnap at gmail.com
Tue Apr 21 00:21:46 EST 2009


Thanks Bob, for your detailed and informative response.  Comments inline...

On Fri, Apr 17, 2009 at 7:48 PM, Bob Proulx <bob at proulx.com> wrote:

> I don't have a perfect understanding of this but not seeing anyone
> else say anything I will jump in and make some suggestions imperfect
> though they will be.  Different types of kernels will handle this
> differently and will account for why different systems behave
> differently.  But most have a limited amount of memory available for
> network resources.  Quickly opening and closing network connections
> can cause memory to be consumed at a high right.  Once the available
> memory is exceeded system calls fail for being out of resources until
> more resources are available.  This is what you are seeing.
>
> Why do resources become consumed?  Look at RFC793 and you will find
> the TCP state diagram.  Look particularly at the TIME_WAIT state.  You
> are probably creating many connections hanging around in the TIME_WAIT
> state after they are closed and until the timeout.  Each of those
> consumes network memory.  You can see these connections by looking at
> the state reported by netstat.  (e.g. 'netstat | grep TIME_WAIT') If
> you see many connections in the TIME_WAIT state then this is what you
> are running into.  In many kernels with a limited amount of network
> resources this limits the rate at which connections may be created and
> closed.
>

Connections aren't in the TIME_WAIT state, they are either CONNECTED or
CONNECTING (about evenly split)


> This isn't really particular to ssh but is generic to anything that
> creates TCP connections.  Since ssh uses TCP it has the same
> limitation as any other program that uses TCP and leaves connections
> in the TIME_WAIT state until they timeout and their resources are
> reclaimed.


Yes, I realize this is not an issue with ssh in particular, but since it is
triggered by ssh, I had hoped this group could more easily point out what
limit is being triggered.  I am continuing to research the issue..

--Bob


More information about the openssh-unix-dev mailing list