SSH connection hanging on logout
John Bowman
bowman at math.ualberta.ca
Thu May 10 23:41:37 EST 2001
> > Try against the latest CVS snapshot. Consider even applying the patch
> > the patch Markus put out for turn blocking I/O back on. This is kinda a
> > seperate issue. I can show this problem exists independent of his patch.
>
> This was the latest CVS.
Does protocol 1 still break (I assume you are using OpenBSD?) when my
hang-on-exit patch is applied to openssh-2.9?
Let's not make the issue any murkier than it already is by applying the
patch to CVS snapshots, which are subject to continual change. In other
words, let's use 2.9 and 2.9p1 as controls for these tests and vary only
one thing at a time (the patch).
>
> > But your right. This does not solve protocol 1 hang on exit. It just
> > solves protocol 2 which is a hint that it's the wrong solution.
> >
> > I must point out that this 'work around' is only required for a LIMITED
> > number of platforms. Which I believe is HP/UX and Linux at this point.
> > Which leads me to believe their is something unique to those platforms.
> > So it may cause failure on platforms that don't require this work around.
>
> I suspect that the problem may be with the Linux kernel itself and how it
> handles filedescriptiors shared between processes. OpenBSD and Solaris
> don't exhibit the problem, the sshd child's fds to the shell get properly
> closed when it exits.
Good. At least we have now established beyond any doubt that this really
*is* a bug under HP-UX and Linux (whether one wants to attribute it to the
OS or to openssh is irrelevant to me; it still needs a workaround either way).
If the hanging behaviour were actually the "correct" behaviour, openssh
would hang on other platforms too, right?
>
> > My fear is by putting this into the CVS tree even the portable version
> > only that we will end up with another 2.3.0pX feasco. Where we suddenly
> > learned what the downfall of the patch is months after the patch is
> > applied and almost forgotten about.
>
I provided the patch only to be helpful to the openssh community. We and
others have been using it on (RedHat and SuSe) Linux production machines
for over a week without problems. For us, the alternative was to switch
back to using ssh.
Linux is the only environment where the patch (restricted to Protocol 2)
has been subject to extensive testing. But of course, with a code this
complex, it is extremely difficult to analyze all possible scenarios.
> yeah - I would much prefer an (avoidable) hang on logout to a potential
> data loss.
>
At the very least, the patch may provide an important clue to solving this
bug. In particular, the fact that workarounds for unusual return values
under HP-UX and Linux (according to the above the only two OS's where the
bug manifests itself) appear in chan_shutdown_read may be relevant.
I'm afraid I can't invest any more time on this patch. However, I can
provide a few questions that perhaps the openssh community can address, in
order to resolve the issues that have been raised here.
QUERIES:
1. Does sleep 20&;exit hang on any OS's other than HP-UX and Linux?
2. Does Protocol 1 lead to data loss when the patch is applied to
openssh-2.9 on BSD?
3. Does chan_shutdown_read really get called under Protocol 1?
When I insert a debug statement at the beginning of chan_shutdown_read
and run with sshd -d,
ssh -v -o Protocol=1 -oForwardX11=no wizard dd if=/dev/zero bs=1024 count=100 | wc -c
does not seem to even call chan_shutdown_read at all under Linux!
This explains why the bug neither fixes the hang-on-exit bug nor leads to
data loss with Protocol 1 under Linux.
4. Has anyone seen a case where Protocol 1 leads to data loss when the
patch is applied to openssh-2.9p1 on Linux?
5. Has anyone seen a case where Protocol 1 leads to data loss when the
patch is applied to openssh-2.9p1 on HP-UX?
6. Has anyone seen a case where Protocol 2 leads to data loss when the
patch is applied to openssh-2.9p1 on Linux?
7. Has anyone seen a case where Protocol 2 leads to data loss when the
patch is applied to openssh-2.9p1 on HP-UX?
8. Has anyone seen a case where Protocol 2 leads to data loss on any OS?
This is the most crucial question.
-- John Bowman
University of Alberta
http://www.math.ualberta.ca/~bowman
More information about the openssh-unix-dev
mailing list