High Performance SSH/SCP - HPN-SSH when?

Wed Apr 5 10:09:15 EST 2006

Chris Rapier wrote:
> 
> Corinna Vinschen wrote:
>> On Mar 25 23:37, Chris Rapier wrote:
>>> [http://www.psc.edu/networking/projects/hpn-ssh/]
>>> Development is on going and new version of the patch will be available 
>>> shortly. This new version of the patch will hopefully address a 
>>> performance issue in LAN transfers.
>> Not sure if it's ok to discuss this here. However, we have some
>> performance problems on Cygwin with the vanilla version of OpenSSH.  The
>> main problem is the size of the read buffers in the client loop, resp.
>> client_process_net_input/client_process_input.  The buffer size is a
>> fixed 8K.  For some reason this degrades performance on Windows
>> enormously, when a copy is made from a remote machine to the local
>> machine, like this:
> 
> I'm guessing that this is probably an issue in the implementation of 
> cygwin. Following the code I would again guess that its probably the 
> memcpy in buffer_append doing it. Perhaps many small memcpy's being more 
> expensive than a few larger ones in the cygwin environment? Is there 
> some sort of trace functionality under cygwin?

I disagree, see below.

>> I also tried the HPN patch, but it doesn't help for this specific
>> situation.  The HPN patch does not touch the application's read buffer
>> size and it turns out that neither changing the underlying SO_RCVBUF
>> buffer sizes nor changing TCP_NODELAY have a really relaxing effect on
>> this situation (less than 10%).  
>>
>> On the contrary, keeping the application buffers at 8K, the performance
>> even degrades with the HPN patch:
>>
>>   cygwin> scp linux:file .
>>   file   100%  118MB   1.0MB/s   01:58
> 
> So this is an additional decrease in performance if the HPN patch is 
> used with 8k buffers? What can you tell me about the path? Does it have 
> a very low RTT? The available version of HPN (HPN11) polls the SO_RCVBUF 
> once every window so in low RTT environments the additional cycles spent 
> on this could have an impact. The beta version of the patch (HPN12) 
> provides a switch to disabled buffer polling. I'm still working on this 
> issue (recreating the problems consistently has been an issue for me) 
> but you might want to look at the HPN12 patch set.

I think this is another symptom of the HPN patch letting the buffers get
way too big under some conditions, then ssh spents a disproportionate
amount of time compacting those buffers.

Assume the the 8k writes are relatively slow on Cygwin (which appears to
be the case[1]).  ssh will be emptying the output buffer relatively
slowly, but the CPU can encrypt much faster than the IO rate.  Normally
the buffer would peak at 5-10 MB under these conditions, but the
BUFFER_MAX_HPN_LEN change means that they can grow really big (up to
2^29 bytes).  The buffer gets compacted at 1MB consumed, so the process
becomes "read 1MB in 8k chunks then memmove (2^29 - 1M).

Corinna, if this is the case you should see ssh consuming a lot more
memory, more CPU and, if you can profile it, spending a lot more time in
memmove.

BTW I think the occurs in the main tree too, but the effects are much
less noticable because the buffers are smaller, and this is why scp's
reported throughput sometimes drops off slightly after the initial
connection.

The solution is to not write more than, say, 1-2 x TCP window size to
the output buffer.  If you're IO bound on the source, you'll never write
more than 1 x TCPWINSZ to the socket and have plenty of CPU to fill it
up again.  If you're CPU bound, you won't spend extra time compacting
the buffers so it will be a small performance improvement.

I think a similar but inverted problem occurs on the sink.

[1] or that small writes is an O(1) operation with a large constant.

-- 
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4  37C9 C982 80C7 8FF4 FA69
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.