ControlMaster failure of RH/CentOS 6.6.1p1-25

Mattson, Robert RMATTSON at harris.com
Tue Apr 26 13:52:37 AEST 2016


Dear OpenSSH Team,



We use OpenSSH ControlPath/multiplexed sessions heavily to avoid session handshake overheads in a system of CentOS machines performing remote shell commands. The commands are run at different times and take different amounts of time to complete. We're currently experiencing a failure of sshd which relies on timing to trigger a fault and elicit a failure. The failure is typically a process termination. Depending on the loop used to reproduce a crash, the time required can be between five minutes and one hour (approx).



On CentOS, failure messages look like [3], on RHEL, reproduced failure messages look like [4].



Previously RedHat supported and fixed a ~very~ similar issue (see ChangeLog entry[1]), however it appears we still see sshd crashes in our system on both CentOS and RHEL (which we know are compiled from the same source code) when using this feature. We are running the latest RPMS for OpenSSH - 6.6.1p1-25.el7_2.x86_64.rpm on both the client and server hosts.

When investigating the issue previously, RedHat documented the method we used to reproduce the failure in some detail[2].
I do swap the while loop described on that page for a different loop which is can elicit a failure sooner.

The loop used to produce the error in [4] was [5], while I have also written/used [6],[7],[8]. There is probably an infinite number of possible loops.



I tried to reproduce this in OpenBSD last May/June and did not see this failure, however, because of the issues around timing, not seeing a failure does not give me confidence there is no fault present.



Is there any guidance available on how this can be resolved?
ControlMaster is an important and very good feature of OpenSSH without which our remote execution system would not be nearly as efficient or scalable.



It is my intention to open a case with RH but sometimes it is difficult to progress such support cases, I'm hoping I can get some assistance to help get this stabilised from wherever I can find it.

Sincerely,

Rob Mattson




Dr Robert Mattson
Product Development
380 St Kilda Road,
Melbourne, Victoria, 3004
Australia

+61 3 9926 0000  phone
robert.mattson at harris.com<mailto:robert.mattson at harris.com>
www.c4i.com<http://www.c4i.com/>

[1] rpm -q --changelog openssh
* Wed Jul 15 2015 Jakub Jelen <jjelen at redhat.com<mailto:jjelen at redhat.com>> 6.6.1p1-15 + 0.9.3-9
...
- Fix race condition with auditing messages answers (#1240613)

[2] - https://access.redhat.com/solutions/1521923

[3] from /var/log/secure
Apr 26 10:42:58 host01 sshd[15138]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 2
Apr 26 10:42:58 host01 sshd[15138]: error: buffer_get_string_ret: buffer_get failed
Apr 26 10:42:58 host01 sshd[15138]: fatal: buffer_get_string: buffer error
<snip />
Apr 26 10:56:58 host01 sshd[15293]: fatal: mm_request_receive: read: bad msg_len 4096768
Apr 26 10:56:58 host01 sshd[15293]: pam_unix(sshd:session): session closed for user targetuser
Apr 26 10:56:58 host01 sshd[15296]: fatal: mm_request_receive: read: Connection reset by peer
Apr 26 10:56:58 host01 sshd[15296]: fatal: mm_request_send: write: Broken pipe
<snip />
Apr 26 10:58:58 host01 sshd[16028]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 53
Apr 26 10:58:58 host01 sshd[16028]: error: buffer_get_string_ret: buffer_get failed
Apr 26 10:58:58 host01 sshd[16028]: fatal: buffer_get_string: buffer error
Apr 26 10:58:58 host01 sshd[16028]: pam_unix(sshd:session): session closed for user targetuser
Apr 26 10:58:58 host01 sshd[16032]: fatal: mm_request_receive: read: Connection reset by peer
Apr 26 10:58:58 host01 sshd[16032]: fatal: mm_request_send: write: Broken pipe
<snip />
Apr 26 11:07:58 host01 sshd[16149]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 50
Apr 26 11:07:58 host01 sshd[16149]: error: buffer_get_string_ret: buffer_get failed
Apr 26 11:07:58 host01 sshd[16149]: fatal: buffer_get_string: buffer error
Apr 26 11:07:58 host01 sshd[16149]: pam_unix(sshd:session): session closed for user targetuser
Apr 26 11:07:58 host01 sshd[16152]: fatal: mm_request_receive: read: Connection reset by peer
Apr 26 11:07:58 host01 sshd[16152]: fatal: mm_request_send: write: Broken pipe
<snip />
Apr 26 11:12:58 host01 sshd[17475]: fatal: mm_request_receive: read: bad msg_len 4529920
Apr 26 11:12:58 host01 sshd[17475]: pam_unix(sshd:session): session closed for user targetuser
Apr 26 11:12:58 host01 sshd[17501]: fatal: mm_request_receive: read: Connection reset by peer
Apr 26 11:12:58 host01 sshd[17501]: fatal: mm_request_send: write: Broken pipe



[4]
time /usr/sbin/sshd -p 1026 -ddd
<snip />

debug2: fd 19 setting O_NONBLOCK
debug2: fd 11 setting O_NONBLOCK
debug2: fd 22 setting O_NONBLOCK
debug2: channel 4: rcvd eof
debug2: channel 4: output open -> drain
debug2: channel 4: obuf empty
debug2: channel 4: close_write
debug2: channel 4: output drain -> closed
debug2: channel 3: read 664 from efd 20
debug2: channel 4: read 664 from efd 22
debug2: channel 3: rwin 2097152 elen 664 euse 1
debug2: channel 3: sent ext data 664
debug2: channel 4: rwin 2097152 elen 664 euse 1
debug2: channel 4: sent ext data 664
debug3: mm_request_receive entering
debug3: monitor_read: checking request 124
buffer_get_ret: trying to get more bytes 68 than in buffer 63             <----*******
buffer_get_string_ret: buffer_get failed                                  <----*******
buffer_get_string: buffer error                                           <----*******
debug1: do_cleanup
debug1: PAM: cleanup
debug1: PAM: closing session
debug1: PAM: deleting credentials
debug3: PAM: sshpam_thread_cleanup entering

real    8m4.228s
user    0m0.506s
sys     0m1.884s
[root at rhel ~]# debug1: server_input_channel_open: ctype session rchan 12 win 2097152 max 32768
debug1: input_session_request
debug1: channel 5: new [server-session]
debug1: session_new: session 7
debug1: session_open: channel 5
debug1: session_open: session 7: link with channel 5
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_req: channel 5 request env reply 0
debug1: session_by_channel: session 7 channel 5
debug1: session_input_channel_req: session 7 req env
debug2: Setting env 0: LANG=en_AU.UTF-8
debug1: server_input_channel_req: channel 5 request exec reply 1
debug1: session_by_channel: session 7 channel 5
debug1: session_input_channel_req: session 7 req exec
Starting session: command for joe from ::1 port 41194
debug3: mm_audit_run_command entering command sleep .2
debug3: mm_request_send entering: type 114
mm_request_send: write: Broken pipe
debug1: do_cleanup
debug3: PAM: sshpam_thread_cleanup entering
debug3: mm_request_send entering: type 124
mm_request_send: write: Broken pipe
[root at rhel ~]#



[5]
while true; do
        runCommand .22
        runCommand .2
        runCommand .2
        runCommand .2
        runCommand .2
        runCommand .2
        sleep .333
done

[6]
while true; do
       runCommand 3
       sleep .6
       runCommand 3
       sleep .6
       runCommand 3
       sleep .6
       runCommand 3
       sleep .6
       runCommand 3
       sleep .7
done



[7]
while true; do
    runCommand 3
    runCommand 1
    runCommand 1
    runCommand 1
    runCommand 1
    runCommand 2
    runCommand 2
    runCommand 2
    sleep 1
    runCommand 1
    sleep 1
done



[8]
while true; do
    runCommand 1
    runCommand 1
    runCommand 1
    runCommand 1
    sleep 1
    runCommand 1
    sleep 1
done




More information about the openssh-unix-dev mailing list