scp completes but ssh subprocess in deadlock with sshd
Nicolas.Williams at ubsw.com
Wed Mar 13 05:47:46 EST 2002
We're seeing the same problem here, but with 2.9p2 clients and 3.0.2p1
servers on Solaris. It seems to be a reliably reproduceable problem,
but looking carefully it really seems like a non-deterministic problem.
I suspect a race.
We have captured some useful debug messages + other info which I will be
posting to bugzilla.
In all cases, whether the ssh hangs or not we see that both the server
and the client have called channel_free() to free channel 0 (the
session), that both have closed both sides of the channel and that the
server sent the exit-status message and that the client received it. The
only difference is that the client ends up going into a select() w/o
timeout selecting for read on the SSH connection socket.
This is really weird; there should be no calls to channel_free() between
the (compat20 && session_closed && !channel_still_open()) check at the top
of the client loop and the call to client_wait_until_can_do_something(),
so I don't see how there can be a race condition. Yet there it is.
In any case scp does exit even if the ssh hangs at the end.
I am baffled.
On Tue, Mar 12, 2002 at 11:27:19AM +1300, Adrian Pronk wrote:
> I've just built openssh 3.1 for my Redhat 5.1 system (running on a 486
> DX-66) using the latest zlib and openssl libraries.
> Connecting to the machine with ssh seems to work fine (although it takes a
> while to initiate a connection).
> But when I transfer a file to the machine with scp, it seems to work fine
> and the scp completes, but an ssh sub-process remains behind on the client
> and an sshd sub-process remains behind on the host. When I strace them,
> the client is waiting on a socket and the host is waiting on three
> different fd's (under 5.1, its hard to tell what they are without making an
> effort :) ).
> I did not compile the system on the target machine (which is my firewall).
> My old development machine was a RH 5.1 box. I bought a new box recently
> and put RH 7.2 on it. I copied the development RH 5.1 file system on to it
> (including /dev). I then chroot'ed to that directory , mounted a new /proc
> and had my 5.1 development environment back. I compiled (make install)
> openSSL, zlib, openSSH on this and copied the likely output files to the
> target machine. I wouldn't think this development environment would break
> Does anyone know off the top of their heads what the problem might be? If
> not, I'll get stuck in and have a look at the code and see if I can see
> openssh-unix-dev at mindrot.org mailing list
-DISCLAIMER: an automatically appended disclaimer may follow. By posting-
-to a public e-mail mailing list I hereby grant permission to distribute-
-and copy this message.-
Visit our website at http://www.ubswarburg.com
This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or error-free
as information could be intercepted, corrupted, lost, destroyed,
arrive late or incomplete, or contain viruses. The sender therefore
does not accept liability for any errors or omissions in the contents
of this message which arise as a result of e-mail transmission. If
verification is required please request a hard-copy version. This
message is provided for informational purposes and should not be
construed as a solicitation or offer to buy or sell any securities or
related financial instruments.
More information about the openssh-unix-dev