Server/Client Alive mechanism issues
Darryl Miles
darryl-mailinglists at netbauds.net
Fri Jan 10 04:18:51 EST 2014
Old thread I know but I have opposite problem. Maybe SSH was changed in
connection with this report ? See my recent (Jan 2014) ML thread.
I am observing SSH waiting for a TCP level timeout to occur when the
other end has done away (and it not sending back any data or TCP RST).
Jeff Mitchell wrote:
> I have a bandwidth-constrained connection that I'd like to run rsync
> over through an SSH tunnel. I also want to detect any network drops
> pretty rapidly.
If you are bandwidth constrained why are you wasting bandwidth on 1
second ping-pongs ? What % of your overall data are you wasting on that
effort?
Does your usage of the application require connection recovery (for a
stalled, non-working connection) within 10s of seconds ? So you are in
a bandwidth contained environment trying to send bulk data and must know
if the other end has become unavailable within 6 seconds of it doing so ?
If you're bandwidth constrained I would have thought both ends would be
patient when waiting for data and turning up Interval (like 10 seconds)
and turning down CountMax (like 2) is a better way to go, increasing
Interval as necessary.
> After about 5 seconds, the connection is being dropped, but during that
> time the rsync is successfully transferring data near the full bandwidth
> of the connection.
Maybe you can ask SSH client/server (on both sides or at least the side
with the most data being pushed) to turn down the SO_SNDBUF to the
minimize the kernel buffer. This can be done on a socket by socket
basis using C kernel API setsockopt(). So is something ssh/sshd needs
to implement on your behalf.
When the connection is sending if you run "netstat -tanp" (on Linux) the
number of bytes in the kernel buffer will be shown in the Send-Q.
Reducing SO_SNDBUF decreases this value but with the effect of causing
the sending process to wake up more often to refill the kernel buffer.
It sounds like your CPU processing power far exceed the network
throughput so I do not think this will be a concern in your scenario.
The lowest value for SO_SNDBUF according to Linux man page is 2048 bytes.
Note if you make this value too low and your CPU does not refill the
kernel buffer and it underruns (i.e. the TCP stack could send data but
there was none available as the application did not wakeup and write()
data quick enough) it will mess up performance as TCP slow start
congestion control may reset causing overall measured throughput to drop.
man 7 socket (search SO_SNDBUF)
man 7 tcp
On Linux see also /proc/sys/net/core/wmem_default for system wide
default of SSH application does not have option.
You can 'cat /proc/sys/net/core/wmem_default' to see the current value,
going below 32k for 100mbit (or better) ethernet system is probably a
bad idea.
Note on the bandwidth restricted application you want to tweak it,
setting it too low will have a major effect on normal performance of a
normal Ethernet based system.
>
> My understanding is that since the alive mechanism is running inside the
> encrypted connection, OpenSSH would be able to (and would) prioritize
> the alive packets over other data. So if any data is able to get through
> (and it does) the alive packets should be able to as well. But this
> doesn't seem to be the case.
No. While SSH is able to multiplex different streams inside a single
TCP connection, the aggregated stream is still subject to kernel Send-Q
buffering and then network latency, congestion and performance metrics.
So what are doing is taking system memory and Ethernet performance tuned
parameters for networking (in the cause of Linux again) and trying to
use them with bandwidth restricted connectivity.
The default OS picked wmem/sendq is based on system memory and other
such inter-related params allowing auto-tune.
Darryl
More information about the openssh-unix-dev
mailing list