hang on exit bug under Linux
Doug Kingston
douglas.kingston at db.com
Fri Jan 4 03:14:38 EST 2002
Markus et al.,
Please pardon me if I've missed anything, but coming into this a bit late...
The hanging of ssh connections with openssh where a job has been
"detached" (backgrounded) is a real concern here as well. We have lots
of existing ssh and rsh scripts that we are converting over to later
versions of openssh, and have started running into this with startling
frequency, and its operational impact is serious when compounded by
regular invocations. We wind up with large numbers of waiting sshd's on
one side and/or hung scripts on the client side.
My understanding from reading code and the archives is that the current
behaviour we are seeing (a hang) is by design - and is intended to
ensure that any output from that spawned task is dutifully carried back
to the originating ssh client for appropriate dispostion. The goal is
to avoid losing any data in the "end game" as things die off and
connections are closed.
However, the prior behavior of rsh and ssh was different, and the
termination of the remote connection was governed by the death of all
child processes (or the only child process). It is this behaviour that
people would like to see again. I agree that technically (in the sense
of not losing data), the new behavior is more correct, and users should
wrap programs they are trying to detach to close all FDs (easily done
with a Perl program and probably shell as well). People would like to
control this on the client side, but its completely server side behavior
and currently there is no way for the client to influence this other
than to recode their scripts (which in our case is 1000's of scripts).
What we need is a way to support the old functionality but in a way that
lets us migrate smoothly over time to the new behavior. I believe that
a few modifcations can be made to the client and server to support both
the new and old behavior, and the controlling of the default behavior.
First the server side changes:
1. add an option to terminate when the primary or all child processes die.
2. add an option to set the default for this flag in the sshd_config file
(default should probably be for the old behavior to be compatible
with v1.3 and 1.5 clients)
3. add code to allow the client side to set this option (client should
overide server)
(I think this needs to be a SSH2_MSG_GLOBAL_REQUEST)
Only ssh v2.0 clients will be able to set this option.
Client side changes:
1. add code to send new option described above.
2. add code to set the default setting of the option in ssh_config
3. add command line processing to override default and send desired
option setting to server.
We need to be aware that there are many different versions of ssh client
code out there, much of it beyond our control, and we need to ensure
that it continues to operate as expected when we upgrade the server
(backwards compatability). This means making the default server
behavior accomodate the older client expectations unless it knows its
got a newer client that wants the new behavior. Once a site has
converted their environment, they can change the default to wait for all
output FDs to close before exiting.
How does this proposal sound to folks? Markus? What have I missed...
-Doug-
--
Douglas Kingston
Director
Global Unix Engineering Manager
Deutsche Bank AG London
6 Bishopsgate
London EC2N 4DA
Work: +44-20-7545-3907
Mobile: +44-7767-616-028
More information about the openssh-unix-dev
mailing list