SuSE Linux Enterprise Server OpenSSH 5.1p1 nagle issue?

Jeremy Guthrie jeremy.guthrie at cdw.com
Thu Oct 18 06:11:34 EST 2012


I have a system in place where it appears that TCP will make a massive 
change in behavior mid-stream with existing SSH sessions. We noticed the 
issue first with an application using an SSH forward.  However, we were 
able to rule that out by generating the same TCP characteristics by 
having a perl script dump text out to a terminal simulating a large data 
flow from the far end(ssh server) back to us(ssh client).

The issue manifests roughly as follows:
1.  Generate a bunch of terminal output(500k)
2.  Sleep 15 seconds
3.  Go back to step 1

After repeating steps 1-3 for some random amount of time(sometimes 3 
minutes, sometimes 50+), the SSH server will go from streaming the 
output back to the client @ 4-4.5 mbps(normal-behavior.png), down to 
30-40kbps(bad-behavior.png).  Most of the time, SSH stays in this 
30-40kbps state for as long as their is data in the TCP queue.  ie. 
during peaks, netstat will show the queue having 90-100k of data waiting 
to be transmitted.

We think that Nagle may be taking effect randomly for some reason. When 
I 'strace -f ssh user at hostname', I don't see the TCP_NODELAY flag being 
set so that could certainly be true.  I look in the ssh docs and I don't 
see anything about NoDelay but there use to be something according to 
O'Reilly docs.  When I examine the source code, it looks like setting 
the TCP_NODELAY is some kind of default.

The odd thing is that I have hundreds of boxes running this same release 
of software and no one else is exhibiting this issue.

Does anyone have any ideas?

-- 
*Jeremy Guthrie*


More information about the openssh-unix-dev mailing list