Question performnace of SSH v1 vs SSH v2

Tue Mar 1 03:09:26 EST 2005

Damien Miller wrote:
> Amba Giri wrote:
> 
>> Hello
>>
>> I have ported OpenSSH 3.8p1 to a LynxOS platform.  Recently I heard a
>> report from the field that  v2 is perceived to be significantly slower
>> than v1.  Is this a known issue? Are there any configuration parameters
>> that can be modified to make v2 faster?
> 
> 
> Protocol 2 is slower because it includes a real per-packet MAC instead
> of a weak checksum. You can save some overhead by using a truncated MAC
> like hmac-sha1-96, but there is always going to be more work per packet.

I'm not sure if this is entirely true. If transfer rates were strictly 
CPU limited then we wouldn't see the typical 'fast on the LAN, slow on 
the WAN' problem. If you saw 20Mbps on the LAN then being CPU limited 
wouldn't explain why it would slow down to 200Kbps on the WAN. 
Additionally, if it was really CPU limited we'd expect to see throughput 
increase as we started using faster machines. Instead, on any given 
path, we reach a throughput plateau that is independent of the CPU 
powers after a certain (relatively low) threshhold.

A closer analysis of the data reveals that the most significant cause of 
the slowdown is due to the flow control mechanisms used for SSH2 
channels. This flow control is similar to TCP windows such that the 
maximum theoretical throughput of the connection (assuming no CPU 
constraints) will be determined, in large part, by the minimum of the 
tcp receive window and the SSH2 flow control buffer. If you update the 
BDP equation (as outlined in Stevens TCP/IP Illustrated) and solve for 
throughput we end up with

bandwidth = (MIN(tcp rwin, SSH2 FC buf))
	    ----------------------------
                       RTT

Since the effective SSH2 flow control buffer is 64K (its actually 
defined as 128K but only 1/2 of it is actually used) and most TCP 
receive windows are now ~80k most people will see a throughput 
specificially bounded by the 64K buffer. So someone on an 80ms path will 
see a theoretical maximum throughput of 64K/80ms or 800KB/s (6.4Mbps).

Obviously, this assumes; 1) that you have a path between the two 
locations that isn't bottlenecked by something else, 2) that you have a 
reasonable processor (say a PIII).

If the SSH2 flow control buffer is increased (edit the code, recompile) 
to some higher value you'll often see the throughput increase linearly 
(doubling the buffer doubles the throughput). Some simple tests can 
confirm this assuming that you have a fat enough network path to play with.

Right now I'm typically getting 240 to 350Mbps on a 35ms path (using 
RC4). At these speeds I'm processor bound but with an unmodified version 
of OpenSSH I'm running at maybe 5 or 6% cpu according to top (only 
around 14 Mbps).

> I have looked at implementing AES CCM, which could be much faster,
> particularly on platforms with AES implemented in CPU instructions, but
> it doesn't fit nicely in the cipher and MAC negotiation mechanism.

That would actually be amazingly cool. Are you going to work on it at 
all or would it just be too much of a hack to incorporate it at this point?