[Bug 1363] New: sshd gets stuck: select() in packet_read_seqnr waits indefinitely
bugzilla-daemon at bugzilla.mindrot.org
bugzilla-daemon at bugzilla.mindrot.org
Mon Sep 17 13:02:31 EST 2007
http://bugzilla.mindrot.org/show_bug.cgi?id=1363
Summary: sshd gets stuck: select() in packet_read_seqnr waits
indefinitely
Product: Portable OpenSSH
Version: 4.2p1
Platform: All
URL: http://marc.info/?t=117394251600035
OS/Version: All
Status: NEW
Keywords: patch
Severity: major
Priority: P2
Component: sshd
AssignedTo: bitbucket at mindrot.org
ReportedBy: openssh at fjarlq.com
Created an attachment (id=1348)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=1348)
latest version of fix -- this has been tested
This bug was discussed on openssh-unix-dev in March 2007:
http://marc.info/?t=117394251600035
During the discussion, Darren Tucker created a fix for the problem and
I (Matt Day) revised and tested it. The latest version of the patch is
attached.
Original problem report:
I'm having a problem where sshd login sessions are occasionally
(as often as once a day) getting stuck indefinitely. I enabled debug
messages and got a backtrace of a stuck sshd, and I think I've found
the bug.
sshd version:
OpenSSH_4.2p1 FreeBSD-20050903, OpenSSL 0.9.7e-p1 25 Oct 2004
Uncommented lines (ie. nondefault settings) in sshd_config:
LogLevel DEBUG
ClientAliveInterval 90
Subsystem sftp /usr/libexec/sftp-server
SSH client:
PuTTY version 0.58, default settings
OS/HW:
FreeBSD 6.1-RELEASE running on 64-bit x86 ("amd64" platform)
Executive summary:
The select() in packet_read_seqnr() waits indefinitely, resulting
in stuck SSH sessions when networking problems interfere with
key exchange. Would like to be able to set a timeout there, or
send SSH keepalives during key exchange.
Periodically (every 60 minutes) the SSH client initiates rekeying
via key exchange. Here's an example of a successful rekeying:
Mar 11 19:02:35 SSH2_MSG_KEXINIT received
Mar 11 19:02:35 SSH2_MSG_KEXINIT sent
Mar 11 19:02:35 kex: client->server aes256-ctr hmac-sha1 none
Mar 11 19:02:35 kex: server->client aes256-ctr hmac-sha1 none
Mar 11 19:02:35 SSH2_MSG_KEX_DH_GEX_REQUEST_OLD received
Mar 11 19:02:35 SSH2_MSG_KEX_DH_GEX_GROUP sent
Mar 11 19:02:35 expecting SSH2_MSG_KEX_DH_GEX_INIT
Mar 11 19:02:38 SSH2_MSG_KEX_DH_GEX_REPLY sent
Mar 11 19:02:38 set_newkeys: rekeying
Mar 11 19:02:38 SSH2_MSG_NEWKEYS sent
Mar 11 19:02:38 expecting SSH2_MSG_NEWKEYS
Mar 11 19:02:38 set_newkeys: rekeying
Mar 11 19:02:38 SSH2_MSG_NEWKEYS received
In the failure case, sshd gets stuck during key exchange. The SSH
session had been going fine for many hours, and then these were the
last messages it logged:
Mar 11 20:02:38 SSH2_MSG_KEXINIT received
Mar 11 20:02:38 SSH2_MSG_KEXINIT sent
Mar 11 20:02:38 kex: client->server aes256-ctr hmac-sha1 none
Mar 11 20:02:38 kex: server->client aes256-ctr hmac-sha1 none
Mar 11 20:02:38 SSH2_MSG_KEX_DH_GEX_REQUEST_OLD received
Mar 11 20:02:38 SSH2_MSG_KEX_DH_GEX_GROUP sent
Mar 11 20:02:38 expecting SSH2_MSG_KEX_DH_GEX_INIT
The user was idle when this happened, but had a program running
that was generating output. That program became tty-blocked after
about 30 minutes, presumably because sshd wasn't draining its output,
and that's when I noticed the user's sshd was stuck and got a
backtrace:
(gdb) where
#0 0x.. in select () from /lib/libc.so.6
#1 0x.. in packet_read_seqnr () from /usr/lib/libssh.so.3
#2 0x.. in packet_read () from /usr/lib/libssh.so.3
#3 0x.. in packet_read_expect () from /usr/lib/libssh.so.3
#4 0x.. in kexgex_server (kex=0x538900) at kexgexs.c:99
#5 0x.. in kex_setup () from /usr/lib/libssh.so.3
#6 0x.. in kex_input_kexinit () from /usr/lib/libssh.so.3
#7 0x.. in dispatch_run () from /usr/lib/libssh.so.3
#8 0x.. in process_buffered_input_packets () at serverloop.c:475
#9 0x.. in server_loop2 (authctxt=0x4) at serverloop.c:760
#10 0x.. in do_authenticated2 (authctxt=0x4) at session.c:2456
#11 0x.. in do_authenticated (authctxt=0x53a400) at session.c:227
#12 0x.. in main at sshd.c:1749
This backtrace agrees with the debug messages: it's in kexgex_server(),
calling packet_read_expect(SSH2_MSG_KEX_DH_GEX_INIT), which ultimately
calls select() from packet_read_seqnr().
The select call in packet_read_seqnr passes NULL for a timeout,
meaning it will wait forever. That explains why the comment above
it says "Note that no other data is processed until this returns,
so this function should not be used during the interactive session."
But, this was an interactive session.
I've set ClientAliveInterval in sshd_config so that SSH sessions
die in a timely manner when networking problems arise, but the
keepalive is apparently not sent during key exchange. The default
TCP keepalive on FreeBSD is unhelpful here; it only kicks in after
2 hours, and I need stuck SSH sessions to die a lot sooner. I want
to keep the FreeBSD TCP keepalive defaults.
Would it be possible for the select() in packet_read_seqnr to use
an optional timeout? Similarly, I believe the select() in
packet_write_wait has the same problem. Upon timeout, it would be
fine with me if the session died with an error logged. Alternatively,
if SSH keepalives were sent during key exchange, that would suffice.
--
Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
More information about the openssh-bugs
mailing list