SuSE Linux Enterprise Server OpenSSH 5.1p1 nagle issue?
jeremy.guthrie at cdw.com
Thu Oct 18 06:11:34 EST 2012
I have a system in place where it appears that TCP will make a massive
change in behavior mid-stream with existing SSH sessions. We noticed the
issue first with an application using an SSH forward. However, we were
able to rule that out by generating the same TCP characteristics by
having a perl script dump text out to a terminal simulating a large data
flow from the far end(ssh server) back to us(ssh client).
The issue manifests roughly as follows:
1. Generate a bunch of terminal output(500k)
2. Sleep 15 seconds
3. Go back to step 1
After repeating steps 1-3 for some random amount of time(sometimes 3
minutes, sometimes 50+), the SSH server will go from streaming the
output back to the client @ 4-4.5 mbps(normal-behavior.png), down to
30-40kbps(bad-behavior.png). Most of the time, SSH stays in this
30-40kbps state for as long as their is data in the TCP queue. ie.
during peaks, netstat will show the queue having 90-100k of data waiting
to be transmitted.
We think that Nagle may be taking effect randomly for some reason. When
I 'strace -f ssh user at hostname', I don't see the TCP_NODELAY flag being
set so that could certainly be true. I look in the ssh docs and I don't
see anything about NoDelay but there use to be something according to
O'Reilly docs. When I examine the source code, it looks like setting
the TCP_NODELAY is some kind of default.
The odd thing is that I have hundreds of boxes running this same release
of software and no one else is exhibiting this issue.
Does anyone have any ideas?
More information about the openssh-unix-dev