[my summary] Re: hang on exit - bug or no bug?

Phil Howard phil-openssh-unix-dev at ipal.net
Sat Oct 6 04:24:41 EST 2001


Schieber, Dustin wrote:

> I definately don't have the same problem with rsh, and never had it with
> ssh 1.2.31.  It was only after upgrading to Openssh that we experienced
> this problem. I can't dispute what has been said about the the code in
> 1.2.31.  I can only state what my experience has been... we did not have
> this hanging problem with 1.2.31.

I have seen the "dangling descriptor" problem with rsh in non-pty
session.  With pty sessions, it did work as telnet worked, and
forced the session down and the pty driver surely sent SIGHUP.

For interactive pty sessions, OpenSSH has this "problem" where
rsh and telnet did not.  But these being manual, ^\. is usable
to force the session down (and presumably SIGHUP on the other
end, if it was still reachable and got the RST).  There's also
kill.

The problem is scripted remote commands.  Data loss prevention
is definitely the #2 goal (security is #1).  The complication is
that if the session parent process exits, is there still some
data that needs to be sent if other processes still hold the
other ends of the pipes open.  It _should_ be easy to make sure
that any data left in the pipe buffer from the exiting process
can be read, even though SIGCHLD will be handled before select()
wakes for the pipe read.  The obvious way is read to EOF.  The
other way is read to EWOULDBLOCK.  But is the EWOULDBLOCK way
really right?  No.  Perhaps on some systems the pipe buffer can
be swapped out even though the writing process has exited and
been waited.  Then there is the issue of background processes.
I do agree that in this case, waiting for EOF is the correct way.
If you're scripting the startup, script it to not pass on the
descriptors to any processes to be sure sshd sees EOF.

Still, a problem remains.  Hangs happen even if there are no other
processes.  These don't happen very often, but I have seen them
happen.  Reproducing them is not easy.  Looking at the sshd with
lsof shows the pipe descriptors still open.  The session process
is gone so they can't be open anywhere else.  Doing strace on the
existing sshd process showed no activity.  I suspected some kind
of corrupted state in serverloop.  I have NOT seen this since 2.9p2
which I've been using for about 3 months.  Maybe it is fixed, but
I've also gone longer than 3 months without the problem happening
even in the earlier versions.  So I am still cautious, but it is
not a regularly occuring problem, either.  THIS is why I was
interested in this thread, hoping someone had see this and figured
out the cause and fixed it.  But it seems the "fix" is for the
dangling descriptor problem (which I can get around).


> Your suggested workarounds have been discussed and in some cases already
> implemented on our hosts.   But the perception in my organization has
> been that Openssh is "broken" because this was not a problem before. The
> FAQ provides very little documentation on this problem, it's unclear if
> it's considered a bug or not.

Maybe it was really "broken" before vis-a-vis data loss from
background processes.  This is why I do think an option to force
down a session if the child of sshd exits (after enough reads to
get EWOULDBLOCK to drain the pipe, or a timeout expires).  I can
see cases where having this option would be useful, but I would
never want it as the default.

I'm thinking of writing a small C program to do this instead.
You would just execute a program via this program.  It would
make a new set of pipes and run the given command, and sit there
and copy data streams.  Then when the process it forked exits,
it would drain the pipes and exit.  The inherited std* descriptors
would not be passed to the child so sshd or rshd they would
definitely see EOF.  Two timeout options would be available.
One indicates the time to wait after the child exist if EOF is
not seen to just go ahead and exit anyway.  The other timeout
option would be used to force an exit when the child does not.
With such a program, adding the option to OpenSSH would not be
of any further need.

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| phil-nospam at ipal.net | Texas, USA | http://phil.ipal.org/     |
-----------------------------------------------------------------



More information about the openssh-unix-dev mailing list