The complete answer (was Re: so-called-hang-on-exit)

Thu Aug 8 14:27:37 EST 2002

Ok, so I think I have a complete explanation for the difference between
the *BSD behaviour and the Linux/Solaris behaviour. Well, almost
complete :)

Pull out your trusty copies of "The Design and Implementation of the
4.4BSD Operating System" as well as "Unix Internals: The New Frontiers".
Specifically, pages 111-112 and 344 of the former and page 108 of
the latter.

It comes down to this:

 - The 4.4BSD tty and pty drivers send SIGHUP followed by SIGCONT
   (for stopped processes) to all orphaned process groups with a given
   tty/pty association when the session leader exits (TDI44BSDOS states
   that POSIX and 4.4BSD do this) - and any open file descriptors
   referring to the tty/pty in any processes that choose to continue
   running are revoked.

 - Whereas SVR4 doesn't do any of this and relies on the session leader
   to do its part and HUP/CONT its process groups. This part is not too
   clear because Uresh Vahalia mentions this very much in passing on
   page 108 of "Unix Internals."

   It is unclear whether closing a pty master causes the pty slave
   driver to take any action, whether sending singals to any processes
   or revoking any open file descriptors, when the session process is
   alive or dead at the time that the master is closed. This can be
   determined experimentally. Ideally closing the pty master after the
   pty slave's session leader has exited will cause the pty slave driver
   to revoke open fildes referring to it *and* to send HUP/CONT to
   remaining process groups with that pty slave association.

   My tests indicate that closing the master pty, on Solaris, does not
   cause the slave pty open fildeses to be revoked and not signal is
   sent, so orphaned background process groups continue to run and
   clutter up the process table *and*, apparently they continue to
   consume a pty which cannot be reused until said processes exit or
   dissasociate from their pty.

   What about Linux?

My own tests today indicate that the Korn Shell, on Solaris, is smart
enough to send HUP/CONT to all of its process groups before exiting,
whereas the Solaris C-Shell is NOT. This points to a bug, or, rather,
the *lack of a feature* in the Solaris C-Shell. I don't care about the
C-Shell, so I won't file a bug report / RFE with Sun - if you care about
the C-Shell then you should, and check SunSolve as it may be that a
suitable patch already exists for all I know.

I don't know what is the exact Linux behaviour, but I rather suspect
that it follows the SVR4 approach.

Comments elsewhere in this thread indicate that Bash 2.x is configurable
with respect to its behaviour on exit, through the 'huponexit' option. I
advise you all to read the Bash man page - search for 'huponexit'.

Which behaviour is best? To leave it to the shell to HUP/CONT its
process groups before exiting? Or to leave it to the tty/pty driver to
do the same? Each has its drawbacks, for example: the former can lead
to undesirable orphaned process groups cluttering the process table if
a shell fails to implement the SVR4 strategy, whereas the latter makes
it impossible to implement the Bash 'disown' built-in command feature.

And making the driver responsible for sending the signals may imply
heavier structures and/or synchronization in the kernel.

******************************

So, Markus, Ben, et. al., I recommend that you close all bugs related
to this hang-on-exit issue and document the problem as being buggy or
insufficiently featured shells on Linux/Solaris. Including a patch to
close the pty master when the session leader exits, on some platforms,
may be probably a good idea, but it's not absolutely necessary - what
is absolutely necessary is that the shells on Linux/Solaris know to
send HUP/CONT to their process groups before exiting.

Jani, Frank, et. al., make sure that your shell is configured correctly
and/or that you use a shell that correctly implements the SVR4 behaviour
and/or that you get or write patches for any broken shells. Merely
forcing the sshd to close the pty master might not be enough, but it
would be good if you could strace/truss an entire session with /bin/csh
as the session leader and a patched sshd that closes the pty master - I
would like to know what happens to backgrounded process groups in such
a case (see above).

Cheers,

Nico
-- 

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.