Fischer, Bill Bill.Fischer at qwest.com
Sat Jan 14 03:32:30 EST 2006


We've found some undesirable behavior with respect to LoginGraceTime.  A
minor code change in session.c seems to clear it up, but now I'm asking
for help in better understanding the problem and determining if there
any unexpected side effects of the change.

First, the code change:

$ diff orig_session.c session.c
<       alarm(0);
>       verbose("Clearing alarm in do_authenticated");
>       /*alarm(0);*/
>       signal(SIGALRM, SIG_IGN);

So, I replaced "alarm(0);" in do_authenticated with a call to verbose
and a "signal(SIGALRM, SIG_IGN);"

Now, the problem description.

We are running OpenSSH 4.2p1 on Solaris 8 (Sparc) that has a recent
recommended patch cluster installed.

When we connect to this server using a variety of ssh clients (including
4.2p1), we noticed that sessions were dropping after about 10 minutes.
We changed the LoginGraceTime to 30 seconds and sure enough, sessions
were dropping in 30 seconds.  We were also seeing messages like: "
Timeout before authentication" in /var/adm/messages when the sessions
were dropping.  Setting LoginGraceTime to 0 (or something like 12 hours)
was the leading candidate for a work around.

We noticed that if we removed/renamed the ~/.ssh/id_rsa and
~/.ssh/id_rsa files on the client side, the connections would stay up.
Similarly, an ssh -i /dev/null allowed the connections to stay up, but
that was an ugly solution at best.

It didn't matter if the id_[rd]sa keys matched an entry in the
authorized_keys2 file on the server.  The connections dropped after the
GraceLoginTime either way.

We'd been working on the problem for 'long enough' (> 60 man hours) and
couldn't find anything interesting on google or the archives, so I dug
into the code and came up with the possible solution above.

After digging into the code a while, I had tried putting
"UsePrivilegeSeparation no" in the sshd_conf file and the problem
persisted, so I don't think it has anything to do with the privsep code.

I can't imagine this effects every Solaris 8 user that has id_[rd]sa
files or we would have seen something in our archive/google searches.
Perhaps a recent Solaris patch introduced a change in libc?  ... but
then why does it only break when id_[rs]da files are present on the
client side.

Questions for you: 
A) Do you think the code change is a viable solution?  There may be well
founded reasons to use alarm(0) instead and/or reasons to avoid using
signal(SIGALRM, SIG_IGN).  Early testing here shows it works (but we
haven't tested extensively yet).   ... if it appears to be viable, can
we get it included in the code base at some point? 

B) Are any of you willing/able to help us pursue the root cause further?
If so, I can provide more configuration in formation.


More information about the openssh-unix-dev mailing list