Difference in buffer behaviour between 8.89 and 8.9?
rapier
rapier at psc.edu
Thu May 19 05:35:58 AEST 2022
Of course, 90 minutes later I find that my issues starts with commit
b30d32159d
"upstream: add a ssh_packet_process_read() function that reads from
a fd directly into the transport input buffer.
Use this in the client and server mainloops to avoid unnecessary
copying. It also lets us use a more greedy read size without penalty.
Yields a 2-3% performance gain on cipher-speed.sh (in a fairly
unscientific test tbf)"
I still don't know why it's causing this behaviour so if anyone has a
clue please share?
Thanks
Chris
On 5/18/22 2:02 PM, rapier wrote:
> Hey all,
>
> I've run in to a strange issue and I'm wondering if anyone has any
> insight on this. So I have a modified version of OpenSSH called hpnssh
> that uses larger receive buffer sizes to improve throughput. When I
> ported that patch set to 8.9 I found out after an initial burst of data
> the throughput would drop to zero after an initial burst and just sort
> of stay there. This happens regardless of the RTT; same issues at
> sub-millisecond RTTs and 150ms.
>
> I've been able to replicate this behaviour in OpenSSH by increasing the
> size of CHAN_SES_DEFAULT_WINDOW from 64*CHAN_SES_PACKET_DEFAULT to
> higher values. You kind of see it in the debug log at 96*CSPD and you
> can start to see it impact throughput at 128*CSDP. With my tests I have
> it at 512*CSPD and the pause will last upwards of 5 seconds. When it's
> at 1024*CSPD (corresponding to a 32MB receive buffer) the pause will
> last 18 seconds or more.
>
> I dropped in a debug statement in to sshbuf.c in sshbuf_allocate to get
> a better view of whats happening. The following are the results of that
> in both 8.9p1 and 8.8p1
>
>
> Version 8.9p1
>
> Start of buffer growth
>
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 98304, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 131072, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 163840, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 196608, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 229376, len: 32768 for 0x55d548211550
> …
> debug1: adjusted rlen: 22003712, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22036480, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22069248, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22085632, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22102016, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22134784, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22167552, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22200320, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22216704, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22233088, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22265856, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22298624, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 22331392, len: 32768 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 22347776, len: 16384 for 0x55d548211550
> debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
> — end of buffer growth
>
> So there is a buffer that kicks in after an short initial period of good
> throughput that has then has to grow larger that the size of
> CHAN_SES_WINDOW_DEFAULT (128*CSPD in this case) before throughput can
> start to pick up again. During this growth period CPU usage is pegged at
> 100% even though throughput is minimal.
>
>
> Version 8.8p1
>
> Start buffer adjustment
>
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 49152, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 81920, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 114688, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 147456, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 180224, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 212992, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 245760, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 278528, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 311296, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 344064, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 376832, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 409600, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 442368, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 475136, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 507904, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 540672, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 573440, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 606208, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 638976, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 671744, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 704512, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 737280, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 770048, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 802816, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 835584, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 868352, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 901120, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 933888, len: 32768 for 0x5570cf28a3e0
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
> debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
>
> End buffer adjustment
>
> In version 8.8 I also see a buffer growing during the course of the
> transfer but it looks like it doesn’t need to grow to the same size as
> in 8.9p1 (in this case WINDOW_DEFAULT was set to 4096*CSPD). This is on
> the same test bed, same set up, and the tests are being run right after
> each other.
>
> The behaviour in 8.9 is very consistent. After a certain number of
> adjusted rlens for, what I am assuming is the packet buffer, it goes
> into this growth phase for another buffer. In my test that been about
> 700 of these adjusts which is unrelated to the size of
> CHAN_SES_WINDOW_DEFAULT
>
> So I’m wondering what changed between 8.8 and 8.9 that might account for
> this and if this is expected/desired behaviour. I'm still working
> through the diff from 8.8 to 8.9 but nothing has leaped out at me yet.
>
> Thanks,
> Chris
More information about the openssh-unix-dev
mailing list