Issues with ssh-agent connecting to a large number of hosts at once
Bob Belnap
bbelnap at gmail.com
Tue Apr 21 00:21:46 EST 2009
Thanks Bob, for your detailed and informative response. Comments inline...
On Fri, Apr 17, 2009 at 7:48 PM, Bob Proulx <bob at proulx.com> wrote:
> I don't have a perfect understanding of this but not seeing anyone
> else say anything I will jump in and make some suggestions imperfect
> though they will be. Different types of kernels will handle this
> differently and will account for why different systems behave
> differently. But most have a limited amount of memory available for
> network resources. Quickly opening and closing network connections
> can cause memory to be consumed at a high right. Once the available
> memory is exceeded system calls fail for being out of resources until
> more resources are available. This is what you are seeing.
>
> Why do resources become consumed? Look at RFC793 and you will find
> the TCP state diagram. Look particularly at the TIME_WAIT state. You
> are probably creating many connections hanging around in the TIME_WAIT
> state after they are closed and until the timeout. Each of those
> consumes network memory. You can see these connections by looking at
> the state reported by netstat. (e.g. 'netstat | grep TIME_WAIT') If
> you see many connections in the TIME_WAIT state then this is what you
> are running into. In many kernels with a limited amount of network
> resources this limits the rate at which connections may be created and
> closed.
>
Connections aren't in the TIME_WAIT state, they are either CONNECTED or
CONNECTING (about evenly split)
> This isn't really particular to ssh but is generic to anything that
> creates TCP connections. Since ssh uses TCP it has the same
> limitation as any other program that uses TCP and leaves connections
> in the TIME_WAIT state until they timeout and their resources are
> reclaimed.
Yes, I realize this is not an issue with ssh in particular, but since it is
triggered by ssh, I had hoped this group could more easily point out what
limit is being triggered. I am continuing to research the issue..
--Bob
More information about the openssh-unix-dev
mailing list