TCP Forwarding hangs when TCP service is unresponsive, even when TCP client exits

Tue Sep 20 10:41:35 AEST 2022

On 2022-09-19 01:55, Damien Miller wrote:
> This is kind of a tricky case, because for some cases it's AFAIK impossible
> for the client to discern between a TCP server that a) will never respond
> from b) hasn't responded *yet*.
> 
> The solution that you proposed is unfortunately not without side
> effects - I think it changes the behaviour of half-closed TCP connection
> in a way that might lose data.

[...]

> I do notice some different behaviour between Linux (above) and OpenBSD.
> On OpenBSD the connection is accepted but obviously does not pass any
> data (of course). This is harder to fix without the side effects I
> mentioned above, e.g. consider a TCP client program that connects to a
> forwarded socket, sends a message and exits without waiting for a reply.
> I think setting c->force_drain in this case could cause the message to be
> lost (though I'm not 100% sure).

I wrote a test TCP client/server, with the design that the TCP client 
would send a message and then exit while the server was sleeping. With 
my patch applied, the SSH client and SSH server both exited while the 
TCP server was sleeping, yet the TCP server did still receive the 
message once it woke up. This was presumably because the kernel already 
had the message stored in a socket buffer.

Still, I've been thinking and reading about this more, and you're 
probably right that my change would break things for some edge cases, 
particularly for half-closed TCP connections that are deliberately 
persistent.

https://superuser.com/questions/298919/what-is-tcp-half-open-connection-and-tcp-half-closed-connection

For a while I was thinking that the SSH server (for the 
remote-forwarding case) ought to simply close the channel when its 
children have exited, but the trouble there is that clients of the TCP 
forwarding need not be children of SSH, so I don't think that is viable.

As a workaround external to SSH itself, I am considering making my 
remote wrapper script kill its parent SSH server process once the TCP 
client (puppet, in my case) has exited. I don't know how to guarantee 
that the channel carrying stdout/stderr has flushed all the data back to 
the SSH client, though; killing the SSH server process prematurely might 
lose in-flight data.

If we don't have a better alternative, would it be acceptable to gate 
the proposed use of force_drain behind a new config option (default to 
disabled)?

Thanks,
Corey