close then select of stderr fd in client (openssh)

Phil Howard phil-openssh-unix-dev at ipal.net
Sun Jul 23 16:53:06 EST 2000


Under certain circumstances (repeatable with a workaround) the client in
openssh-2.1.1p3 and p4 closes file descriptors and then calls select()
with the stderr one in the write fd_set.  The circumstances which cause
this appears to be that the closing of stdin/stdout/stderr occurs before
the last of the stderr data is written to stderr.

This occurs when a tty is not allocated, but the error occurs on the
client side.  So apparently is it perhaps the timing or order of data
coming from the server that triggers this.  This occurs on platforms
Solaris 7, Slackware 7.0, Slackare 3.4, and Redhat 6.0 with all of them
being used as either client or server in various combinations.  In all
cases protocol version 2 is configured.


Here is a simple example with Slackware 7.0 as client and server:

phil at procyon:/home/phil 1311> ssh izar 'ls this_file_does_not_exist'
ls: select: Bad file descriptor
phil at procyon:/home/phil 1312> ssh izar 'ls this_file_does_not_exist;sleep 1'
ls: this_file_does_not_exist: No such file or directory
phil at procyon:/home/phil 1313>


Another example with Solaris 7 client and Redhat 6.0 server:

phil at sirius:/home/phil 57> ssh mira 'ls this_file_does_not_exist'
ls: select: Bad file number
phil at sirius:/home/phil 58> ssh mira 'ls this_file_does_not_exist;sleep 1'
ls: this_file_does_not_exist: No such file or directory
phil at sirius:/home/phil 59>


The problem also occurs when client and server are the same machine, so
physical network timings aren't expected to be the trigger:

phil at procyon:/home/phil 1315> ssh procyon 'ls this_file_does_not_exist'
ls: select: Bad file descriptor
phil at procyon:/home/phil 1316>


I did strace of ssh -v and discovered the following syscall events:

close(6)                                = 0
select(7, [3], [3 6], NULL, NULL)       = -1 EBADF (Bad file descriptor)

occurred in the failing case.  Notice the 6 in the write fd_set (3rd arg).
The successful case (using the 1 second sleep) looked like:

close(6)                                = 0
select(7, [3], [3], NULL, NULL)         = 1 (out [3])

So regardless of any failings that may exist on the server side, the
client is clearly doing the wrong thing at times with respect to the
building of the write fd_set for select().  I'm too unfamiliar with
the organization of the code (it's jumping around to too many different
functions for me to keep track of in clientloop.c) to really figure out
exactly why this is happening.  I can just see that it is definitely
happening

At first I thought the bug was on the server side, so I was doing strace
of sshd -d to see what was happening.  There definitely is a difference
in the sequence of events in the server side for the failing and successful
cases.  This may be triggering the problem on the client side, or just be
the result of it; I don't know.


Here's documentation I have captured:

The "failure" and "success" names are the failure and success cases.
The "combine" is the failure and success cases interleaved with the
difference set aside as its own block of lines.

Server straces of sshd -d:

http://phil.ipal.org/openssh/ssh-strace-servers-combine.txt
http://phil.ipal.org/openssh/ssh-strace-servers-failure.txt
http://phil.ipal.org/openssh/ssh-strace-servers-success.txt

Client straces of ssh -v:

http://phil.ipal.org/openssh/ssh-strace-clients-combine.txt
http://phil.ipal.org/openssh/ssh-strace-clients-failure.txt
http://phil.ipal.org/openssh/ssh-strace-clients-success.txt

In the combine files, the indicator "S-" is on each line from the
success case, and "-F" is on each line from the failure case.  The
blocks of differences are set apart with a row of 77 equal signs.
The interesting parts are at near the bottom of each file, but the
whole thing is included to make sure all relevant information is
there.


I hope someone who understands the organization of the client code can
figure out the cause.  Since I'm in the USA I can't contribute back a
patch even if I do find it.  Again, this is all protocol version 2
as I have both clients and servers configured to do version 2 only
and all keys are DSA.  If you are having trouble reproducing it, it
does not always occur.  Give it several tries.  Another factor that may
be involved is that I have no passphrase for the key (but I don't really
expect this to be relevant).

-- 
| Phil Howard - KA9WGN | My current websites: linuxhomepage.com, ham.org
| phil  (at)  ipal.net +----------------------------------------------------
| Dallas - Texas - USA | phil-evaluates-email-ads-750-dollars-each at ipal.net





More information about the openssh-unix-dev mailing list