Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?

Goldburt, Dan Dan.Goldburt at dowjones.com
Thu Sep 14 08:17:28 EST 2006


Hi,


To recap, I'm establishing one master ssh connection and am opening many
session through that one master connection. 

Often I get "select: Bad file descriptor errors" and the server thrashes
at 100% CPU. The symptoms are very similar to those in this post:
http://sourceware.org/ml/cygwin/2001-09/msg01217.html
But the solution there doesn't work for OpenSSH.

Initially I believed that the number of file descriptors being opened
were overrunning the fd_set. But it doesn't seem like I'm overrunning
the FD_SETSIZE. On the server, ulimit -n return 256, and I also tried
the following.

Darren Tucker wrote:
> BTW did you try bumping FD_SETSIZE when configuring
> OpenSSH with your
> increased MAX_SESSIONS?
> eg: ./configure --with-cflags=-DFD_SETSIZE=256
>

I'm getting select() errors randomly, sometimes selecting up to file
descriptor 31, sometimes up to 38, sometimes up to 191, but sometimes I
don't get any errors (even for over 80 simultaneous sessions.) But once
it happens once, every subsequent select will fail (looping and
thrashing the server).

This seems to me to be a serious bug. Yes, I did increase MAX_SESSIONS
from 10 to 128, but that just made it easier to generate the error. You
can also reproduce it if you install sshd to listen on several ports
(start it with multiple -p arguments), and on each port establish a
multiplexed connection with many sessions. In any case, shouldn't
OpenSSH somehow handle the EBADF?

The offending code is in serverloop.c, line 332:
ret = select((*maxfdp)+1, *readsetp, *writesetp, NULL, tvp);

I tried setting *maxfdp to FD_SETSIZE (as suggested in the post above),
but then I would get EBADF every single time. I also tried setting
*maxfdp to something small like 30, but then select would always come
back with 0 because the fd it was interested in was greater than 30.

The unix manpage defines "EBADF: One or more of the file descriptor sets
specified a file descriptor that is not a valid open file descriptor."
(http://www.scit.wlv.ac.uk/cgi-bin/mansec?3C+FD_SET). I modified the
code to handle the EBADF error. An example of what I'm printing right
now is "select: EBADF (bad file descriptor), maxfdp=38 FD_SETSIZE=256
readsetp=4 writesetp=4". Here is the code following the select:

ret = select(..);

if (ret == -1) {
	memset(*readsetp, 0, *nallocp);
	memset(*writesetp, 0, *nallocp);
	if (errno == EBADF) {
		error("select: EBADF (bad file descriptor), maxfdp=%d
FD_SETSIZE=%d readsetp=%d writesetp=%d", (*maxfdp), FD_SETSIZE,
sizeof(*readsetp), sizeof(*writesetp));
		fatal("Bad file descriptor loop");
	}
	else if (errno != EINTR) {
		error("select: %.100s", strerror(errno));
	}
}
.
.

How can I print something to best debug the problem? Anybody have a best
guess why I'm getting EBADF? Was a file descriptor unexpectedly closed
but we are still trying to select on it?



More information about the openssh-unix-dev mailing list