HPN Patch for OpenSSH 4.2p1 Available
Chris Rapier
rapier at psc.edu
Tue Oct 11 01:51:10 EST 2005
Here is what my test looked like. I used /dev/zero as a data source and
/dev/null as a sink to avoid any disk IO issues. Compression was set to
no to avoid any data compression on the stream. Cipher was RC4.
A standard 4.1 server was running on port 22228 and a 4.1HPN server was
running on 22229. The machine is a dual xeon 2ghz box with 1gb of ram
running linux 2.4.29 and the receive buffer set to 4MB
The command used was:
time for i in 1 2 3 4 5 6 7 8 9 0; do head -c 100000000 /dev/zero |
./ssh -p 2222[8|9] -ocompression=no -o ciphers=arcfour localhost "cat -
> /dev/null"; done
4.1p1 -> 4.1p1
real 0m22.300s
user 0m18.160s
sys 0m4.790s
4.1 -> 4.1HPN
real 0m21.001s
user 0m16.590s
sys 0m4.790s
4.1HPN -> 4.1
real 0m21.982s
user 0m17.380s
sys 0m4.950s
4.1HPN -> 4.1HPN
real 0m20.557s
user 0m16.380s
sys 0m4.770s
Which is what I'd expect from a localhost connection. There doesn't seem
to be any really statistically significant variation. Of course, its
only a run of ten connections so I'll be rerunning it with a a couple
hundred connections. I'll see what that looks like. I might run
individual times on each connection and get some SD values just for
curiosities sake.
Now, here is the same test running between two machines connected via
GigE on the same LAN (different subnets though average RTT of 0.859ms).
The source machine I believe is single CPU Xeon 2.4Ghz, 1GB ram, linux
2.6.13. The sink is the machine referenced above.
4.1p1 -> 4.1p1
real 0m29.528s
user 0m17.703
sys 0m4.276s
4.1 -> 4.1HPN
real 0m22.874s
user 0m17.202s
sys 0m4.553s
4.1HPN -> 4.1
real 0m28.942s
user 0m17.370s
sys 0m4.134s
4.1HPN -> 4.1HPN
real 0m22.315s
user 0m16.621s
sys 0m4.614s
So where you saw a performance decrease I'm seeing an improvement of
roughly 22%. I'm going to rerun these all these tests with more
iterations but right now I'm, unfortunately, not seeing the same
problems you are. Unfortunate because it just makes this a more
complicated question :\
My gut feeling is that the tests, while somewhat different in details,
are roughly equivilant in method. However, maybe there is a problem with
my methodology. Let me know if you see anything.
Chris
Darren Tucker wrote:
> Chris Rapier wrote:
>
>>The performance hit is, initially somewhat suprising, but on reflection
>>I think this is mostly going to be dependent on how the systems TCP
>>stacks are tuned. I can see performance decreasing in the LAN if the
>>receive buffer is too high. An easy test would be to run some iPerfs and
>>see if the system TCP receive buffer shows a similar performance hit
>>versus setting the the iPerf window to 64k. I'll run some similar tests
>>to see if I'm right.
>
>
> I think that what's happening is that on the source side, the buffers
> are growing larger than is necessary and ssh is spending
> disproportionately more time managing them. This would explain why I
> saw more impact for faster cpus: they can fill the buffers faster and
> thus are affected more.
>
> Going back to my earlier data:
> -current -hpn11
> real 2m40.966s 2m57.296s (10.2% slower)
> user 0m33.187s 0m45.820s
> sys 0m31.500s 0m37.539s
>
> Note that there is a significant increase in user time. If the
> difference was purely due to stack tuning, I'd expect user time to be
> similar.
>
> I'm also going to profile ssh, but my bet is that extra time is in the
> buffer functions.
>
>
>>However, this should only be an issue with non-autotuning kernels.
>>Autotuning kernels such as linux 2.4.26+. 2.6+, and Windows Vista (nee
>>Longhorn) will adjust the receive buffer to maximize throughput. Since
>>HPN-SSH is auto-tuning aware this shouldn't be a problem on those
>>systems. On non-autotuning kernels appropriate use of the -w option
>>should resolve this.
>
>
> The Linux kernel used for the test was 2.6.12-1.1376_FC3.
>
>
>>Again, I'll need to test this but I'm pretty sure thats the issue.
>>
>>Darren Tucker wrote:
>>
>>>Chris Rapier wrote:
>>>
>>>
>>>>As a note, we now have HPN patch for OpenSSH 4.2 at
>>>>http://www.psc.edu/networking/projects/hpn-ssh/
>>>>
>>>>Its still part of the last set of patches (HPN11) so there aren't any
>>>>additional changes in the code. It patches, configures, compiles, and
>>>>passes make tests without a problem. I've not done extensive testing
>>>>for this version of openssh but I don't foresee any problems.
>>>>
>>>>I did run a couple tests between two patched 4.2 installs (one in
>>>>switzerland, the other in pennsylvania, usa) and hit around 12MB/s
>>>>with the hpn patch and 500KB/s with the standard install. So it still
>>>>seems to work as expected.
>>>
>>>
>>>Have you done any analysis of the impact of the patch on
>>>low-latency/BDP links (ie LANs)?
>>>
>>>I've been doing some work with parts of the HPN patch. For scp'ing to
>>>and from localhost and LANs (see below), the scp changes on their own
>>>(actually, a variant thereof) shows a modest improvement of 2-3% in
>>>throughput. That's good.
>>>
>>>For the entire patch (hpn11), however, it shows a significant
>>>*decrease* in throughput for the same tests: 10% slower on OpenBSD to
>>>localhost, 12% slower on Linux to localhost and 18% slower Linux to
>>>OpenBSD via a 100Mb/s LAN). That's bad. I would imagine LANs are
>>>more common than high BDP networks that your patch targets :-)
>>
>>I'll check this of course. We don't have any OpenBSD systems though but
>>I'll try to find a spare box we can blow it on to.
>
>
> Thanks. I'm most interested in the Linux results at this point, with
> and without the stack tuning.
>
>
>>>I suspect that the buffer size increases you do are counterproductive
>>>for such networks. I also suspect that you could get the same
>>>performance improvements on high-BDP links as you currently do by
>>>simply increasing the channel advertisements without the SSH buffer
>>>changes and relying on the TCP socket buffer for output and decrypting
>>>quickly for input but I'm not able to test that.
>>
>>Of course is possible to increase performance through other means.
>>Transferring data in parallel for example, is a great way to do this. In
>>fact the suggestions you made are ones we were planning on implementing
>>in addition to the buffer hack. Especially now that we really are CPU
>>limited as opposed to being buffer limited.
>>
>>However, that seems, in my view at least, to be an overly complicated
>>way of going about it. The buffer hack is pretty straightforward and
>>well known - the concept is laid out in Stevens TCP/IP Illustrated
>>Volume 1 after all.
>
>
> I'll have to dig out my copy and read that bit :-)
>
>
>>>Test method, with a 64MB file:
>>>$ time for i in 1 2 3 4 5 6 7 8 9 0; do scp -ocompression=no -o
>>>ciphers=arcfour /tmp/tmp localhost:/dev/null; done
>>
>>I'll try this out and let you know what I find. Could you let me know
>>what you had your tcp receive buffer set to when you tried these tests?
>>Optimally for these local tests it should be set to 64k.
>
>
> On OpenBSD: the default (16k send and recv).
>
> On Linux, whatever this decodes to:
> $ sysctl -a |egrep 'tcp.*mem'
> net.ipv4.tcp_rmem = 4096 87380 174760
> net.ipv4.tcp_wmem = 4096 16384 131072
> net.ipv4.tcp_mem = 98304 131072 196608
>
>
>>By the way - I just ran the above test and didn't see any sort of
>>substantive difference. There was a slight edge to the HPN platform
>>*but* that was probably due to the scp hack.
>>
>>Of course, in this case the problem is likely to be disk speed limits
>>(I'm on a powerbook at the moment and its disks are slow). Bypassing scp
>>and doing a direct memory to memory copy is probably the right methodology.
>
>
> The source files for the Athlon were from RAM (/tmp mounted on mfs).
>
> The others were from local disk (reiserfs for Linux system, ufs
> w/softdep for the OpenBSD).
>
More information about the openssh-unix-dev
mailing list