Difference in buffer behaviour between 8.89 and 8.9?

rapier rapier at psc.edu
Thu May 19 05:35:58 AEST 2022


Of course, 90 minutes later I find that my issues starts with commit 
b30d32159d

"upstream: add a ssh_packet_process_read() function that reads from

     a fd directly into the transport input buffer.

     Use this in the client and server mainloops to avoid unnecessary
     copying. It also lets us use a more greedy read size without penalty.

     Yields a 2-3% performance gain on cipher-speed.sh (in a fairly
     unscientific test tbf)"

I still don't know why it's causing this behaviour so if anyone has a 
clue please share?

Thanks
Chris



On 5/18/22 2:02 PM, rapier wrote:
> Hey all,
> 
> I've run in to a strange issue and I'm wondering if anyone has any 
> insight on this. So I have a modified version of OpenSSH called hpnssh 
> that uses larger receive buffer sizes to improve throughput. When I 
> ported that patch set to 8.9 I found out after an initial burst of data 
> the throughput would drop to zero after an initial burst and just sort 
> of stay there. This happens regardless of the RTT; same issues at 
> sub-millisecond RTTs and 150ms.
> 
> I've been able to replicate this behaviour in OpenSSH by increasing the 
> size of CHAN_SES_DEFAULT_WINDOW from 64*CHAN_SES_PACKET_DEFAULT to 
> higher values. You kind of see it in the debug log at 96*CSPD and you 
> can start to see it impact throughput at 128*CSDP. With my tests I have 
> it at 512*CSPD and the pause will last upwards of 5 seconds. When it's 
> at 1024*CSPD (corresponding to a 32MB receive buffer) the pause will 
> last 18 seconds or more.
> 
> I dropped in a debug statement in to sshbuf.c in sshbuf_allocate to get 
> a better view of whats happening. The following are the results of that 
> in both 8.9p1 and 8.8p1
> 
> 
> Version 8.9p1
> 
> Start of buffer growth
> 
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 98304, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 131072, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 163840, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 196608, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 229376, len: 32768 for 0x55d548211550
>> debug1: adjusted rlen: 22003712, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22036480, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22069248, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22085632, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22102016, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22134784, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22167552, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22200320, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22216704, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22233088, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22265856, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22298624, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22331392, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22347776, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> — end of buffer growth
> 
> So there is a buffer that kicks in after an short initial period of good 
> throughput that has then has to grow larger that the size of 
> CHAN_SES_WINDOW_DEFAULT (128*CSPD in this case) before throughput can 
> start to pick up again. During this growth period CPU usage is pegged at 
> 100% even though throughput is minimal.
> 
> 
> Version 8.8p1
> 
> Start buffer adjustment
> 
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 49152, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 81920, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 114688, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 147456, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 180224, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 212992, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 245760, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 278528, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 311296, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 344064, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 376832, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 409600, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 442368, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 475136, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 507904, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 540672, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 573440, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 606208, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 638976, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 671744, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 704512, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 737280, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 770048, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 802816, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 835584, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 868352, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 901120, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 933888, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> 
> End buffer adjustment
> 
> In version 8.8 I also see a buffer growing during the course of the 
> transfer but it looks like it doesn’t need to grow to the same size as 
> in 8.9p1 (in this case WINDOW_DEFAULT was set to 4096*CSPD). This is on 
> the same test bed, same set up, and the tests are being run right after 
> each other.
> 
> The behaviour in 8.9 is very consistent. After a certain number of 
> adjusted rlens for, what I am assuming is the packet buffer, it goes 
> into this growth phase for another buffer. In my test that been about 
> 700 of these adjusts which is unrelated to the size of 
> CHAN_SES_WINDOW_DEFAULT
> 
> So I’m wondering what changed between 8.8 and 8.9 that might account for 
> this and if this is expected/desired behaviour. I'm still working 
> through the diff from 8.8 to 8.9 but nothing has leaped out at me yet.
> 
> Thanks,
> Chris


More information about the openssh-unix-dev mailing list