SCP client prints out "lost connection" error message occasionally

JCA 1.41421 at gmail.com
Sat Apr 18 09:48:16 EST 2009


I am using the OpenSSH client (version 5.2p1) in a Linux box L to
interact with an embedded SSH server S. When carrying out a recursive
transfer from S to L by means of the scp command issued in L (S does
not support sftp) the client occasionally prints out a "lost
connection" error message at the very end of the transfer.

After some debugging I found out that the error message (as printed
out from lostconn() in scp.c) occurs because the ssh process in L,
spawned by the scp command, has already terminated, but the scp
command still wants to write something to the pipe it uses to
communicate with this ssh process. I have observed a few things of
interest here.

First, the traces for the SSH server in S reveal that, in all cases
(i.e. whether or not the "lost connection" error is printed out by the
client) the exchange gets successfully completed. All the files that
have to be transferred are transferred all right, with no data missing
in the transferred files. More to the point: The traces show that the
server started the closing phase by sending an exit-status
SSH_MSG_CHANNEL_REQUEST message followed by an SSH_MSG_CHANNEL_EOF
message and an SSH_MSG_CHANNEL_CLOSE message, to which the OpenSSH
client at L replies with an SSH_MSG_CHANNEL_CLOSE message of its own:
The session is closed correctly, as far as the server in S is
concerned.

Second, if I modify ssh.c in the OpenSSH code so that before exiting
main() the program sleeps for one second, the "lost connection" error
message never appears.

Third, the ssh process always exits with a 0 return value.

I can see this "lost connection" issue only when L and S are connected
via a fast network. By this I mean that I don't see with a 100Mbps or
a 10Mbps network, but I do with a 1Gbps network.

Any ideas on how to characterize this further?


More information about the openssh-unix-dev mailing list