Nagle & delayed ACK strike again
Miklos Szeredi
miklos at szeredi.hu
Fri Dec 22 10:14:40 EST 2006
> > To me it still looks like the use of Nagle is the exception, it has
> > already been turned off in the server for
> >
> > - interactive sessions
>
> For at least some interactive sessions. In the telnet space at least,
> there is this constant back and forth happening bewteen wanting
> keystrokes to be nice and uniform, and not overwhelming slot terminal
> devices (eg barcode scanners) when applications on the server dump a
> bunch of stuff down stdio.
For ssh this is unconditional. I've suggested adding NoDelay/
NoNoDelay options, but somebody on this list vetoed that.
> > - X11 forwarding
> >
> > and it will need to be turned off for
> >
> > - SFTP transport
> >
> > - IP tunnelling
> >
> > - ???
> >
> > Is there any transported protocol where Nagle does make sense?
>
> Regular FTP is one, anything unidirectional.
Nagle doesn't help FTP or HTTP does it? Anything that just pushes a
big chunk of data will automatically end up with big packets.
So other than the disputed interactive session, Nagle doesn't seem to
have any positive effects.
> It also depends on what one is trying to optimize. If one is only
> interested in optimizing time, Nagle may not be the thing. However,
> Nagle can optimize the ratio of data to data+headers and it can optimize
> the quanity of CPU consumed per unit of data transferred.
For a filesystem protocol obviously latency (and hence throughput) is
the most important factor.
> Some netperf data for the unidirectional case, between a system in Palo
> Alto and one in Cupertino, sending-side CPU utilization included,
> similar things can happen to receive-side CPU:
>
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> SocketSocket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 131072 219136 512 10.10 74.59 8.78 -1.00 9.648 -1.000
>
> raj at tardy:~/netperf2_work$ src/netperf -H tardy.cup.hp.com -c -- -m 512
> -s 128K -S 128K -D
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> tardy.cup.hp.com (15.244.56.217) port 0 AF_INET : nodelay
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 131072 219136 512 10.02 69.21 20.56 -1.00 24.335 -1.000
>
> The multiple concurrent request/response case is more nuanced and
> difficule to make. Basically, it is a race between how many small
> requests (or responses) will be made at one time, the RTT between the
> systems, the standalone ACK timer on the receiver, and the service time
> on the receiver.
>
> Here is some data with netperf TCP_RR between those two systems:
>
> raj at tardy:~/netperf2_work$ src/netperf -H tardy.cup.hp.com -c -t TCP_RR
> -- -r 128,2048 -b 3
> TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> tardy.cup.hp.com (15.244.56.217) port 0 AF_INET : first burst 3
> Local /Remote
> Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
> Send Recv Size Size Time Rate local remote local remote
> bytes bytes bytes bytes secs. per sec % S % U us/Tr us/Tr
>
> 16384 87380 128 2048 10.00 1106.42 4.74 -1.00 42.852 -1.000
> 32768 32768
> raj at tardy:~/netperf2_work$ src/netperf -H tardy.cup.hp.com -c -t TCP_RR
> -- -r 128,2048 -b 3 -D
> TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> tardy.cup.hp.com (15.244.56.217) port 0 AF_INET : nodelay : first burst 3
> Local /Remote
> Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
> Send Recv Size Size Time Rate local remote local remote
> bytes bytes bytes bytes secs. per sec % S % U us/Tr us/Tr
>
> 16384 87380 128 2048 10.01 2145.98 10.49 -1.00 48.875 -1.000
> 32768 32768
>
>
> Now, setting TCP_NODELAY did indeed produce a big jump in transactions
> per second. Notice though how it also resulted in a 14% increase in CPU
> utilization per transaction. Clearly the lunch was not free.
>
> The percentage difference in transactions per second will converge the
> larger the number of outstanding transactions. Taking the settings from
> above, where the first column is the size of the burst in netperf, the
> second is without TCP_NODELAY set, the third with:
>
> raj at tardy:~/netperf2_work$ for i in 3 6 9 12 15 18 21 24 27; do echo $i
> `src/netperf -H tardy.cup.hp.com -t TCP_RR -l 4 -P 0 -v 0 -- -r 128,2048
> -b $i; src/netperf -H tardy.cup.hp.com -t TCP_RR -l 4 -P 0 -v 0 -- -r
> 128,2048 -b $i -D`; done
> 3 1186.40 2218.63
> 6 1952.53 3695.64
> 9 2574.49 4833.47
> 12 3194.71 4856.63
> 15 3388.54 4784.26
> 18 4215.70 5099.52
> 21 4645.97 5170.89
> 24 4918.16 5336.79
> 27 4927.71 5448.78
>
> If we increase the request size to 256 bytes, and the response to 8192
> (In all honesty I don't know what sizes sftp might use so I'm making
> wild guesses) we can see the convergence happen much sooner - it takes
> fewer of the 8192 byte responses to take the TCP connection to the
> bandwidth delay product of the link:
>
> raj at tardy:~/netperf2_work$ for i in 3 6 9 12 15 18 21 24 27; do echo $i
> `src/netperf -H tardy.cup.hp.com -t TCP_RR -l 4 -P 0 -v 0 -- -r 256,8192
> -b $i -s 128K -S 128K; src/netperf -H tardy.cup.hp.com -t TCP_RR -l 4 -P
> 0 -v 0 -- -r 256,8192 -s 128K -S 128K -b $i -D`; done
> 3 895.18 1279.38
> 6 1309.11 1405.38
> 9 1395.30 1325.44
> 12 1256.75 1422.01
> 15 1412.39 1413.64
> 18 1400.04 1419.76
> 21 1415.62 1422.79
> 24 1419.56 1420.10
> 27 1422.43 1379.72
In SFTP the WRIYR request/reply sizes are more like 64kB/32B, and the
outstanding transactions are as many as the socket buffers will bear.
The slowdown is clearly due to 50ms outages from delayed ACK, which is
totally broken, the network is just sitting there idle for no good
reason whatsoever.
I can make new traces, but I guess they would be very similar to the
ones I sent last time for the SFTP download case.
Miklos
More information about the openssh-unix-dev
mailing list