[PATCH] Added NoDelay config option and nodelay subsystem option

Wed Jan 30 12:45:50 EST 2002

Just for grins, I decided to hack an initial request burst into the
netperf TCP_RR test, and then run that test between two systems with
dummynet installed. Dummynet allows simulation of link rate and delay
(and loss rates and such) and pushes as a STREAMS module into HP-UX 11.

I initially set-up dummynet to simulate a typical users DSL link with
128000 bps up, 384000bps down, and 100ms of delay. 

Making only partially informed guesses I assumed that a read request was
something like 128 bytes, and the reply was around 8KB. If the bitrate
back to me is 384000bps, roughly 5.8 transactions per second should be
the maximum the TCP_RR test should report.

At 384000 bps, it will take ~171 milliseconds to transmit the response
onto the wire. When the burst is set to one, that means that there will
be from the apps point of view two requests outstanding at any one time.
The first TCP segment of a response will contain the ACK for that
request, and it will only take 50 milliseconds (plus xmit time) for the
queued request to get to the sender, so everything should saturate with
a pre-burst of only one in this case:

TCP REQUEST/RESPONSE TEST to 192.168.1.30
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate         
bytes  Bytes  bytes    bytes   secs.    per sec   

Pre-burst is 0
57344  57344  128      8192    10.01       3.40   
57344  57344 
Pre-burst is 1
57344  57344  128      8192    10.01       5.49   
57344  57344 
Pre-burst is 2
57344  57344  128      8192    10.01       5.49   
57344  57344 

There was no problem with Nagle here even with the sub-mss request size.
the limit of 5.49 trans/s sounds good given the other overheads I didn't
count in the 5.8 (headers etc)

Now up the bitrate to 384/1500, keeping the 100 milliseconds. My
theoretical max trans/s is now sometihng like 22.9. This change should
mean that the transmission onto the wire should be < the RTT. It will
also give me a higher transaction rate so my CPU util measures have some
hope of accuracy...

TCP REQUEST/RESPONSE TEST to 192.168.1.30
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem  
S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local  
remote
bytes  bytes  bytes   bytes  secs.   per sec  % I    % U    us/Tr  
us/Tr

Pre-burst is 0
57344  57344  128     8192   10.01   7.09    0.15   -1.00  210.241 
-1.000
57344  57344 
Pre-burst is 1
57344  57344  128     8192   10.01   8.99    0.13   -1.00  139.903 
-1.000
57344  57344 
Pre-burst is 2
57344  57344  128     8192   10.01   12.69   0.14   -1.00  113.075 
-1.000
57344  57344 
Pre-burst is 3
57344  57344  128     8192   10.01   15.39   0.28   -1.00  183.987 
-1.000
57344  57344 
Pre-burst is 4
57344  57344  128     8192   10.01   16.09   0.32   -1.00  201.202 
-1.000
57344  57344 
Pre-burst is 5
57344  57344  128     8192   10.01   17.09   0.22   -1.00  128.477 
-1.000
57344  57344 
Pre-burst is 6
57344  57344  128     8192   10.01   21.08   0.27   -1.00  127.765 
-1.000
57344  57344 
Pre-burst is 7
57344  57344  128     8192   10.01   21.18   0.25   -1.00  117.972 
-1.000
57344  57344 
Pre-burst is 8
57344  57344  128     8192   10.01   21.18   0.30   -1.00  142.988 
-1.000
57344  57344 
Pre-burst is 9
57344  57344  128     8192   10.01   21.18   0.23   -1.00  107.349 
-1.000
57344  57344 
Pre-burst is 10
57344  57344  128     8192   10.01   21.28   0.18   -1.00  85.826 
-1.000
57344  57344 
Pre-burst is 11
57344  57344  128     8192   10.01   21.18   0.22   -1.00  104.132 
-1.000
57344  57344 
Pre-burst is 12
57344  57344  128     8192   10.01   21.28   0.16   -1.00  73.787 
-1.000
57344  57344 
Pre-burst is 13
57344  57344  128     8192   10.01   21.18   0.15   -1.00  73.119 
-1.000
57344  57344 
Pre-burst is 14
57344  57344  128     8192   10.01   21.28   0.12   -1.00  58.184 
-1.000
57344  57344 

you can see that the service demand is somewhat "noisy" but that once we
get near to MSS/req size (11.41 here), there is a large drop in the
service demand. Now, try the same thing with TCP_NODELAY set:

TCP REQUEST/RESPONSE TEST to 192.168.1.30: nodelay
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem  
S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local  
remote
bytes  bytes  bytes   bytes  secs.   per sec  % I    % U    us/Tr  
us/Tr
Pre-burst is 0
57344  57344  128     8192   10.01   7.10    0.04   -1.00  60.621 
-1.000
57344  57344 
Pre-burst is 1
57344  57344  128     8192   10.01   13.89   0.17   -1.00  123.216 
-1.000
57344  57344 
Pre-burst is 2
57344  57344  128     8192   10.01   20.19   0.23   -1.00  114.587 
-1.000
57344  57344 
Pre-burst is 3
57344  57344  128     8192   10.01   21.29   0.43   -1.00  199.874 
-1.000
57344  57344 
Pre-burst is 4
57344  57344  128     8192   10.01   21.29   0.36   -1.00  168.648 
-1.000
57344  57344 
Pre-burst is 5
57344  57344  128     8192   10.01   21.29   0.26   -1.00  121.913 
-1.000
57344  57344 
Pre-burst is 6
57344  57344  128     8192   10.01   21.29   0.31   -1.00  145.254 
-1.000
57344  57344 
Pre-burst is 7
57344  57344  128     8192   10.01   21.29   0.15   -1.00  69.243 
-1.000
57344  57344 
Pre-burst is 8
57344  57344  128     8192   10.01   21.29   0.16   -1.00  76.577 
-1.000
57344  57344 
Pre-burst is 9
57344  57344  128     8192   10.01   21.29   0.15   -1.00  72.690 
-1.000
57344  57344 
Pre-burst is 10
57344  57344  128     8192   10.01   21.29   0.16   -1.00  74.079 
-1.000
57344  57344 
Pre-burst is 11
57344  57344  128     8192   10.01   21.29   0.20   -1.00  95.971 
-1.000
57344  57344 
Pre-burst is 12
57344  57344  128     8192   10.00   21.29   0.21   -1.00  96.921 
-1.000
57344  57344 
Pre-burst is 13
57344  57344  128     8192   10.00   21.29   0.19   -1.00  90.452 
-1.000
57344  57344 
Pre-burst is 14
57344  57344  128     8192   10.00   21.29   0.23   -1.00  107.179 
-1.000
57344  57344 

I will admit here that the behaviour is not as I expected. That setting
TCP_NODELAY got to max trans/s sooner is not a big surprise. What was a
surprise was that the service demand dropped sooner and then started
going back up. I suspect it is all a matter of timing and whether or not
there was some batching happening from other means.

I suspect that initial congestion window also has something to do with
it. Even if TCP_NODELAY is set, for a sufficiently large burst size,
only the first N of the burst will go-out at once, and the remainder
will sit there waiting for the congestion window to open. I suppose I
need to look at tcpdump traces of all this :(.... time passes.... under
HP-UX 11 it would seem that the tcp_cwnd_initial of 4*MSS translates
into enough initial cwnd for all 14 of the first burst of 14 to be sent
without cwnd delay when TCP_NODELAY is set. This is consistent with the
cwnd being implemented on a byte-count basis.

rick jones
-- 
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to raj in cup.hp.com  but NOT BOTH...