Difference in buffer behaviour between 8.89 and 8.9?

rapier rapier at psc.edu
Thu May 19 04:02:03 AEST 2022


Hey all,

I've run in to a strange issue and I'm wondering if anyone has any 
insight on this. So I have a modified version of OpenSSH called hpnssh 
that uses larger receive buffer sizes to improve throughput. When I 
ported that patch set to 8.9 I found out after an initial burst of data 
the throughput would drop to zero after an initial burst and just sort 
of stay there. This happens regardless of the RTT; same issues at 
sub-millisecond RTTs and 150ms.

I've been able to replicate this behaviour in OpenSSH by increasing the 
size of CHAN_SES_DEFAULT_WINDOW from 64*CHAN_SES_PACKET_DEFAULT to 
higher values. You kind of see it in the debug log at 96*CSPD and you 
can start to see it impact throughput at 128*CSDP. With my tests I have 
it at 512*CSPD and the pause will last upwards of 5 seconds. When it's 
at 1024*CSPD (corresponding to a 32MB receive buffer) the pause will 
last 18 seconds or more.

I dropped in a debug statement in to sshbuf.c in sshbuf_allocate to get 
a better view of whats happening. The following are the results of that 
in both 8.9p1 and 8.8p1


Version 8.9p1

Start of buffer growth

debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 98304, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 131072, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 163840, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 196608, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 229376, len: 32768 for 0x55d548211550
…
debug1: adjusted rlen: 22003712, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22036480, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22069248, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22085632, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22102016, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22134784, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22167552, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22200320, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22216704, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22233088, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22265856, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22298624, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22331392, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22347776, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
— end of buffer growth

So there is a buffer that kicks in after an short initial period of good 
throughput that has then has to grow larger that the size of 
CHAN_SES_WINDOW_DEFAULT (128*CSPD in this case) before throughput can 
start to pick up again. During this growth period CPU usage is pegged at 
100% even though throughput is minimal.


Version 8.8p1

Start buffer adjustment

debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 49152, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 81920, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 114688, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 147456, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 180224, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 212992, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 245760, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 278528, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 311296, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 344064, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 376832, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 409600, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 442368, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 475136, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 507904, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 540672, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 573440, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 606208, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 638976, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 671744, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 704512, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 737280, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 770048, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 802816, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 835584, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 868352, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 901120, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 933888, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410

End buffer adjustment

In version 8.8 I also see a buffer growing during the course of the 
transfer but it looks like it doesn’t need to grow to the same size as 
in 8.9p1 (in this case WINDOW_DEFAULT was set to 4096*CSPD). This is on 
the same test bed, same set up, and the tests are being run right after 
each other.

The behaviour in 8.9 is very consistent. After a certain number of 
adjusted rlens for, what I am assuming is the packet buffer, it goes 
into this growth phase for another buffer. In my test that been about 
700 of these adjusts which is unrelated to the size of 
CHAN_SES_WINDOW_DEFAULT

So I’m wondering what changed between 8.8 and 8.9 that might account for 
this and if this is expected/desired behaviour. I'm still working 
through the diff from 8.8 to 8.9 but nothing has leaped out at me yet.

Thanks,
Chris


More information about the openssh-unix-dev mailing list