[Bug 3578] New: RFE: forward error correction

bugzilla-daemon at mindrot.org bugzilla-daemon at mindrot.org
Tue Jun 13 09:47:34 AEST 2023


https://bugzilla.mindrot.org/show_bug.cgi?id=3578

            Bug ID: 3578
           Summary: RFE: forward error correction
           Product: Portable OpenSSH
           Version: 9.3p1
          Hardware: Other
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: ssh
          Assignee: unassigned-bugs at mindrot.org
          Reporter: openssh at richardneill.org

Rationale:

Sometimes, the underlying link has flaky hardware, and yet plenty of
speed and bandwidth. 

For example, Wi-Fi, with sporadic bursts of radio-interference, or a
failing network switch which drops 25% of packets, or resets every 10
seconds (being out for 5 seconds), even when the network is far from
saturated. In this case, the channel-capacity, and error-rate is
oscillating between high and low.

This results in an unusable SSH connection, because of packet drops,
and worse, the TCP flow-control algorithm misinterprets this as
"network saturation / congestion", so it backs off, when the correct
response to this situation is "try harder".

This is particularly challenging when you are trying to remotely debug
over such a link, and SSH gets "stuck" even when some pings are getting
through.


Proposal:

I suggest a --flaky option, which would do 3 things:

* forward error-correction: preemptively transmit each packet 3x (both
from the client-end and the server-end) without waiting to find out
whether it was lost.

* tweak the TCP timeout-timer for this connection to < 0.5 second, i.e.
be much more aggressive about when a packet is deemed to be lost, and
is re-requested.

* If interactive, flash the cursor red, to indicate the moments that it
is trying to retransmit.

An alternative might be to do this at the protocol-layer, or to do some
sort of RAID-style error-correction / data-interleaving so that all the
data can be reconstructed, only if 1/3 of the packets are received.


Examples:

* Voyager space-probe does forward error correction on transmitted
data, because it knows some data will be lost to interference.

* CD players can cope with huge blocks of read-failures (experiment on
an unwanted disc - you can draw 8 radial lines with a 3mm thick black
marker pen and it will still playback ok).


Test case:

If this works, it would allow a somewhat sluggish, but still usable
interactive SSH session to work, even if:

* 50% of packets, randomly selected, were dropped / delayed

* 50% of packets, in square-wave-bursts at a 100ms, 1-second, or
10-second scale were dropped.


I hope the idea is helpful - thank you for your time.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the openssh-bugs mailing list