hang on exit bug under Linux

Fri Jan 4 03:14:38 EST 2002

Markus et al.,

Please pardon me if I've missed anything, but coming into this a bit late...

The hanging of ssh connections with openssh where a job has been 
"detached" (backgrounded) is a real concern here as well.  We have lots 
of existing ssh and rsh scripts that we are converting over to later 
versions of openssh, and have started running into this with startling 
frequency, and its operational impact is serious when compounded by 
regular invocations.  We wind up with large numbers of waiting sshd's on 
one side and/or hung scripts on the client side.

My understanding from reading code and the archives is that the current 
behaviour we are seeing (a hang) is by design - and is intended to 
ensure that any output from that spawned task is dutifully carried back 
to the originating ssh client for appropriate dispostion.  The goal is 
to avoid losing any data in the "end game" as things die off and 
connections are closed.

However, the prior behavior of rsh and ssh was different, and the 
termination of the remote connection was governed by the death of all 
child processes (or the only child process).  It is this behaviour that 
people would like to see again.  I agree that technically (in the sense 
of not losing data), the new behavior is more correct, and users should 
wrap programs they are trying to detach to close all FDs (easily done 
with a Perl program and probably shell as well).  People would like to 
control this on the client side, but its completely server side behavior 
and currently there is no way for the client to influence this other 
than to recode their scripts (which in our case is 1000's of scripts).

What we need is a way to support the old functionality but in a way that 
lets us migrate smoothly over time to the new behavior.   I believe that 
a few modifcations can be made to the client and server to support both 
the new and old behavior, and the controlling of the default behavior.

First the server side changes:
1. add an option to terminate when the primary or all child processes die.
2. add an option to set the default for this flag in the sshd_config file
    (default should probably be for the old behavior to be compatible 
with v1.3 and 1.5 clients)
3. add code to allow the client side to set this option (client should 
overide server)
    (I think this needs to be a SSH2_MSG_GLOBAL_REQUEST)
    Only ssh v2.0 clients will be able to set this option.

Client side changes:
1. add code to send new option described above.
2. add code to set the default setting of the option in ssh_config
3. add command line processing to override default and send desired 
option setting to server.

We need to be aware that there are many different versions of ssh client 
code out there, much of it beyond our control, and we need to ensure 
that it continues to operate as expected when we upgrade the server 
(backwards compatability).  This means making the default server 
behavior accomodate the older client expectations unless it knows its 
got a newer client that wants the new behavior.  Once a site has 
converted their environment, they can change the default to wait for all 
output FDs to close before exiting.

How does this proposal sound to folks?  Markus?  What have I missed...

-Doug-

-- 

Douglas Kingston
Director
Global Unix Engineering Manager

Deutsche Bank AG London
6 Bishopsgate
London EC2N 4DA

Work:	+44-20-7545-3907
Mobile:	+44-7767-616-028