[PATCH] Added NoDelay config option and nodelay subsystem option
Rick Jones
rick_jones2 at hp.com
Wed Jan 30 12:45:50 EST 2002
Just for grins, I decided to hack an initial request burst into the
netperf TCP_RR test, and then run that test between two systems with
dummynet installed. Dummynet allows simulation of link rate and delay
(and loss rates and such) and pushes as a STREAMS module into HP-UX 11.
I initially set-up dummynet to simulate a typical users DSL link with
128000 bps up, 384000bps down, and 100ms of delay.
Making only partially informed guesses I assumed that a read request was
something like 128 bytes, and the reply was around 8KB. If the bitrate
back to me is 384000bps, roughly 5.8 transactions per second should be
the maximum the TCP_RR test should report.
At 384000 bps, it will take ~171 milliseconds to transmit the response
onto the wire. When the burst is set to one, that means that there will
be from the apps point of view two requests outstanding at any one time.
The first TCP segment of a response will contain the ACK for that
request, and it will only take 50 milliseconds (plus xmit time) for the
queued request to get to the sender, so everything should saturate with
a pre-burst of only one in this case:
TCP REQUEST/RESPONSE TEST to 192.168.1.30
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
Pre-burst is 0
57344 57344 128 8192 10.01 3.40
57344 57344
Pre-burst is 1
57344 57344 128 8192 10.01 5.49
57344 57344
Pre-burst is 2
57344 57344 128 8192 10.01 5.49
57344 57344
There was no problem with Nagle here even with the sub-mss request size.
the limit of 5.49 trans/s sounds good given the other overheads I didn't
count in the 5.8 (headers etc)
Now up the bitrate to 384/1500, keeping the 100 milliseconds. My
theoretical max trans/s is now sometihng like 22.9. This change should
mean that the transmission onto the wire should be < the RTT. It will
also give me a higher transaction rate so my CPU util measures have some
hope of accuracy...
TCP REQUEST/RESPONSE TEST to 192.168.1.30
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem
S.dem
Send Recv Size Size Time Rate local remote local
remote
bytes bytes bytes bytes secs. per sec % I % U us/Tr
us/Tr
Pre-burst is 0
57344 57344 128 8192 10.01 7.09 0.15 -1.00 210.241
-1.000
57344 57344
Pre-burst is 1
57344 57344 128 8192 10.01 8.99 0.13 -1.00 139.903
-1.000
57344 57344
Pre-burst is 2
57344 57344 128 8192 10.01 12.69 0.14 -1.00 113.075
-1.000
57344 57344
Pre-burst is 3
57344 57344 128 8192 10.01 15.39 0.28 -1.00 183.987
-1.000
57344 57344
Pre-burst is 4
57344 57344 128 8192 10.01 16.09 0.32 -1.00 201.202
-1.000
57344 57344
Pre-burst is 5
57344 57344 128 8192 10.01 17.09 0.22 -1.00 128.477
-1.000
57344 57344
Pre-burst is 6
57344 57344 128 8192 10.01 21.08 0.27 -1.00 127.765
-1.000
57344 57344
Pre-burst is 7
57344 57344 128 8192 10.01 21.18 0.25 -1.00 117.972
-1.000
57344 57344
Pre-burst is 8
57344 57344 128 8192 10.01 21.18 0.30 -1.00 142.988
-1.000
57344 57344
Pre-burst is 9
57344 57344 128 8192 10.01 21.18 0.23 -1.00 107.349
-1.000
57344 57344
Pre-burst is 10
57344 57344 128 8192 10.01 21.28 0.18 -1.00 85.826
-1.000
57344 57344
Pre-burst is 11
57344 57344 128 8192 10.01 21.18 0.22 -1.00 104.132
-1.000
57344 57344
Pre-burst is 12
57344 57344 128 8192 10.01 21.28 0.16 -1.00 73.787
-1.000
57344 57344
Pre-burst is 13
57344 57344 128 8192 10.01 21.18 0.15 -1.00 73.119
-1.000
57344 57344
Pre-burst is 14
57344 57344 128 8192 10.01 21.28 0.12 -1.00 58.184
-1.000
57344 57344
you can see that the service demand is somewhat "noisy" but that once we
get near to MSS/req size (11.41 here), there is a large drop in the
service demand. Now, try the same thing with TCP_NODELAY set:
TCP REQUEST/RESPONSE TEST to 192.168.1.30: nodelay
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem
S.dem
Send Recv Size Size Time Rate local remote local
remote
bytes bytes bytes bytes secs. per sec % I % U us/Tr
us/Tr
Pre-burst is 0
57344 57344 128 8192 10.01 7.10 0.04 -1.00 60.621
-1.000
57344 57344
Pre-burst is 1
57344 57344 128 8192 10.01 13.89 0.17 -1.00 123.216
-1.000
57344 57344
Pre-burst is 2
57344 57344 128 8192 10.01 20.19 0.23 -1.00 114.587
-1.000
57344 57344
Pre-burst is 3
57344 57344 128 8192 10.01 21.29 0.43 -1.00 199.874
-1.000
57344 57344
Pre-burst is 4
57344 57344 128 8192 10.01 21.29 0.36 -1.00 168.648
-1.000
57344 57344
Pre-burst is 5
57344 57344 128 8192 10.01 21.29 0.26 -1.00 121.913
-1.000
57344 57344
Pre-burst is 6
57344 57344 128 8192 10.01 21.29 0.31 -1.00 145.254
-1.000
57344 57344
Pre-burst is 7
57344 57344 128 8192 10.01 21.29 0.15 -1.00 69.243
-1.000
57344 57344
Pre-burst is 8
57344 57344 128 8192 10.01 21.29 0.16 -1.00 76.577
-1.000
57344 57344
Pre-burst is 9
57344 57344 128 8192 10.01 21.29 0.15 -1.00 72.690
-1.000
57344 57344
Pre-burst is 10
57344 57344 128 8192 10.01 21.29 0.16 -1.00 74.079
-1.000
57344 57344
Pre-burst is 11
57344 57344 128 8192 10.01 21.29 0.20 -1.00 95.971
-1.000
57344 57344
Pre-burst is 12
57344 57344 128 8192 10.00 21.29 0.21 -1.00 96.921
-1.000
57344 57344
Pre-burst is 13
57344 57344 128 8192 10.00 21.29 0.19 -1.00 90.452
-1.000
57344 57344
Pre-burst is 14
57344 57344 128 8192 10.00 21.29 0.23 -1.00 107.179
-1.000
57344 57344
I will admit here that the behaviour is not as I expected. That setting
TCP_NODELAY got to max trans/s sooner is not a big surprise. What was a
surprise was that the service demand dropped sooner and then started
going back up. I suspect it is all a matter of timing and whether or not
there was some batching happening from other means.
I suspect that initial congestion window also has something to do with
it. Even if TCP_NODELAY is set, for a sufficiently large burst size,
only the first N of the burst will go-out at once, and the remainder
will sit there waiting for the congestion window to open. I suppose I
need to look at tcpdump traces of all this :(.... time passes.... under
HP-UX 11 it would seem that the tcp_cwnd_initial of 4*MSS translates
into enough initial cwnd for all 14 of the first burst of 14 to be sent
without cwnd delay when TCP_NODELAY is set. This is consistent with the
cwnd being implemented on a byte-count basis.
rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
More information about the openssh-unix-dev
mailing list