The complete answer (was Re: so-called-hang-on-exit)

Nicolas Williams Nicolas.Williams at ubsw.com
Fri Aug 9 00:54:04 EST 2002


On Wed, Aug 07, 2002 at 10:12:51PM -0700, Frank Cusack wrote:
> On Thu, Aug 08, 2002 at 12:27:37AM -0400, Nicolas Williams wrote:
[...]
> >    My tests indicate that closing the master pty, on Solaris, does not
> >    cause the slave pty open fildeses to be revoked and not signal is
> 
> But it must, otherwise changes in ownership would give access that
> shouldn't be there to the bg process group, No?

No, it doesn't revoke the slave pty open file descriptors. What does
happen is that the orphaned process groups' processes are dissassociated
from the pty slave, but the open file descriptors are not revoked, so
that those processes continue to hold a pty slave whose master is now
*not* available for reuse.

Try it. Use 'sleep' instead of 'yes' (in most cases the bg processes
don't interact with the tty), and use a /bin/csh (remember, KSH/Bash
don't have this problem). I did, so that after typing ~. there was no
sshd holding the pty master; lsof <pty slave> showed the bg sleep
processes still having references to the pty, but ps -ef showed those
processes as having no associated terminal (there was a '?' in the
terminal column) and, more importantly, the next ssh did not get the
same pty as the previous one when the previous one left bg procs with
open fildes references to the pty slave.

> >    sent, so orphaned background process groups continue to run and
> >    clutter up the process table *and*, apparently they continue to
> >    consume a pty which cannot be reused until said processes exit or
> >    dissasociate from their pty.
> > 
> >    What about Linux?
> 
> Interesting.  If you look at linux/drivers/char/tty_io.c:disassociate_ctty()
>  you can see that SIGHUP and SIGCONT are sent.  disassociate_ctty() is called
> from linux/kernel/exit.c if the exiting process is the session leader.
> Yet Linux has the problem.

Which processes get HUP/CONT? Can you trace some /bin/csh and bash
sessions (with huponexit on)?

> > So, Markus, Ben, et. al., I recommend that you close all bugs related
> > to this hang-on-exit issue and document the problem as being buggy or
> > insufficiently featured shells on Linux/Solaris. Including a patch to
> > close the pty master when the session leader exits, on some platforms,
> > may be probably a good idea, but it's not absolutely necessary - what
> > is absolutely necessary is that the shells on Linux/Solaris know to
> > send HUP/CONT to their process groups before exiting.
> 
> Ack!  I would say it is needed.  There are *plenty* of other implemented
> workarounds for wierd behavior on xyz given platform.  It's too painful
> to have to set your shell vars correctly, etc.  What if you use a shell
> that doesn't support that kind of thing?  etc.

The patch merely prevents the hanging ssh/sshd. It does NOT prevent
the accumulation of useless bg procs that burn up a pty.

I seem to remember one poster complaining bitterly about those bg procs
hanging around. Your patch won't help him - neither will Jani's, or
Markus'.

> The patch does no harm.  There's no reason not to include it.

This is true, but the underlying problem remains.

The correct patch would actually do a vhangup() on the pty master,
except, except, that that would be incorrect because vhangup() is a
hangover from the old 4.3BSD days (before BSD had sessions), and so the
semantics implemented by Solaris (as per the man page), which are the
4.3BSD semantics, are incorrect (see the man page).

So the truly correct fix would be to implement an updated vhangup() that
implements the session-sbased semantics - but this is very difficult to
do in user-level code! Then again, there is lsof, so parse the output of
'lsof <pty slave>' and kill -HUP/CONT those processes. And then call the
real vhangup() (because Solaris has no revoke(2) syscall, but vhangup()
gets that part right, see?).

But the really, truly, actually correct fix is to get your shells to do
The Right Thing (tm) on SVR4 derivatives (and Linux?).

> > Jani, Frank, et. al., make sure that your shell is configured correctly
> > and/or that you use a shell that correctly implements the SVR4 behaviour
> 
> I've known about huponexit forever.  (RedHat documents this as a workaround.)
> This is not an acceptable workaround for me.  Now, this isn't really a

Why? Do your users use that evil C Shell? :)

BTW, you can see why the C Shell is the problem here: no vendor would
re-implement it when they can just use the BSD code with the absolute
minimum changes necessary, but, since this difference in pty/tty
driver / shell behaviour between 4.4BSD + derivatives and SVR4 +
derivatives is sooo underdocumented, it's easy to see that vendors of
SVR4 derivatives might not update /bin/csh to HUP/CONT its process
groups before exiting. This is a rather obscure problem that hopefully
will no longer be obscure.

But, we all know that the C Shell is evil and so avoid it, right? right?

http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=Csh+programming+considered+harmful

Once again, this explains why I've not cared too much about this problem
before, as I use KSH and Bash and so never experienced it.

> problem b/c I have to maintain a local version of openssh anyway, but
> I always prefer to have minimal changes, and other folks want/need this
> also!
> 
> Thanks for investing time in this, Nico.

Oh, it was fun. :)

Besides, now I know how to answer users here who do run into this
hang-on-exit problem ("switch to a real shell," perhaps :) :)


> /fc


Cheers,

Nico
-- 
-DISCLAIMER: an automatically appended disclaimer may follow. By posting-
-to a public e-mail mailing list I hereby grant permission to distribute-
-and copy this message.-

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.




More information about the openssh-unix-dev mailing list