Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?

Wed Sep 13 00:38:36 EST 2006

> Darren Tucker wrote:
> 
> Goldburt, Dan wrote:
> >
> > 1. What does the cygwin limitation bound my max sessions
> to?
>
> I'm not sure, actually.  You seem to be hitting some limit
> at 31
> descriptors before the fd_set one, which should be at 64
> (3 per session
> = ~20 concurrent).  What does "ulimit -n" report the
> descriptor limit
> as, and do you have some local processes using some of
> them?

ulimit -n on cygwin reports 256 open files max. Is there a per-process
limit? I'm thinking specifically about setdtablesize() (see
http://sourceware.org/ml/cygwin/2000-09/msg00286.html)

> > 2. I need to make sure if I do still accidentally
> overrun the fd_set, I
> > will not crash sshd. Right now it goes into an infinite
> loop spitting
> > out "select: Bad file descriptor" and taking up 100%
> CPU. Surely this is
> > a bug that needs to be patched?
> 
> Maybe, but it only occurs with modified code, right?

Not necessarily. The only modification was to increase MAX_SESSIONS per
connection from 10 to 128. But even without the change, let's say I have
one multiplexed connection that is hosting 10 sessions. I can also
simultaneously open 15 regular ssh connections to have 25 sessions
opening up, and there is a good chance I will overrun the fd_set.

> > 3. Any chance I can overcome the limitation from inside
> sshd? How do I
> > implement the following:
> >> To make this work, you would probably need to break the
> >> select into FD_SETSIZE chunks somehow.
> 
> I was thinking of overloading select an associated macros
> in the compat
> library but it's probably not trivial.  

Kudos to anyone who attempts that. It would make for a very robust
solution, IMHO. Even better would be for this change to be made in the
Cygwin select() code (that is what sshd is using, correct? Also, there
seems to be some confusion whether winsock's select() implementation is
being used in cygwin or not - see
http://sourceware.org/ml/cygwin/1999-12/msg00149.html).

> Damien said that
> the fd_sets
> were dynamically allocated but I'm not sure how that helps
> in the case
> where there's more than FD_SETSIZE descriptors.

I'm not sure either. What does he mean by dynamically allocated? I see
in serverloop.c (lines 638 - 642):
> max_fd = MAX(connection_in, connection_out);
> max_fd = MAX(max_fd, fdin);
> max_fd = MAX(max_fd, fdout);
> max_fd = MAX(max_fd, fderr);
> max_fd = MAX(max_fd, notify_pipe[0]);

which dynamically increments the number of file descriptors to select
on. The next lines (655 and 645) use this value:

> /* Sleep in select() until we can do something. */
> wait_until_can_do_something(&readset, &writeset, &max_fd,   
> &nalloc, max_time_milliseconds);

and this is where (I think/haven't proved) the bug lives. As soon as
max_fd exceeds FD_SETSIZE, we try to select on a fd outside of the
cygwin supported array size. This accounts for the "select: Bad file
descriptor" error, but more importantly since the fd we are actually
interested in lies outside of the selectable range, the returned bitmask
will never change and we will never break out of the "sleep in select
loop". (see http://sourceware.org/ml/cygwin/1999-11/msg00451.html)
I think this is a serious bug on, can somebody please vet my analysis?

As far as I can tell, max_fd should be bounded to FD_SETSIZE under
cygwin.

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev at mindrot.org
http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev