ControlMaster failure of RH/CentOS 6.6.1p1-25

Jakub Jelen jjelen at redhat.com
Tue Apr 26 19:16:11 AEST 2016


Hello Rob,
I am sorry for the caused problems. The previous attempt to resolve this 
issue, mentioned in the linked changelog didn't solve the issue completely.

This is also not caused by anything in openssh upstrem, but by our 
downstream audit patch and it is already addressed for RHEL7.3. Red Hat 
Support will be able to provide you testing package to address this issue.

Kind regards,
Jakub

On 04/26/2016 05:52 AM, Mattson, Robert wrote:
> Dear OpenSSH Team,
>
>
>
> We use OpenSSH ControlPath/multiplexed sessions heavily to avoid session handshake overheads in a system of CentOS machines performing remote shell commands. The commands are run at different times and take different amounts of time to complete. We're currently experiencing a failure of sshd which relies on timing to trigger a fault and elicit a failure. The failure is typically a process termination. Depending on the loop used to reproduce a crash, the time required can be between five minutes and one hour (approx).
>
>
>
> On CentOS, failure messages look like [3], on RHEL, reproduced failure messages look like [4].
>
>
>
> Previously RedHat supported and fixed a ~very~ similar issue (see ChangeLog entry[1]), however it appears we still see sshd crashes in our system on both CentOS and RHEL (which we know are compiled from the same source code) when using this feature. We are running the latest RPMS for OpenSSH - 6.6.1p1-25.el7_2.x86_64.rpm on both the client and server hosts.
>
> When investigating the issue previously, RedHat documented the method we used to reproduce the failure in some detail[2].
> I do swap the while loop described on that page for a different loop which is can elicit a failure sooner.
>
> The loop used to produce the error in [4] was [5], while I have also written/used [6],[7],[8]. There is probably an infinite number of possible loops.
>
>
>
> I tried to reproduce this in OpenBSD last May/June and did not see this failure, however, because of the issues around timing, not seeing a failure does not give me confidence there is no fault present.
>
>
>
> Is there any guidance available on how this can be resolved?
> ControlMaster is an important and very good feature of OpenSSH without which our remote execution system would not be nearly as efficient or scalable.
>
>
>
> It is my intention to open a case with RH but sometimes it is difficult to progress such support cases, I'm hoping I can get some assistance to help get this stabilised from wherever I can find it.
>
> Sincerely,
>
> Rob Mattson
>
>
>
>
> Dr Robert Mattson
> Product Development
> 380 St Kilda Road,
> Melbourne, Victoria, 3004
> Australia
>
> +61 3 9926 0000  phone
> robert.mattson at harris.com<mailto:robert.mattson at harris.com>
> www.c4i.com<http://www.c4i.com/>
>
> [1] rpm -q --changelog openssh
> * Wed Jul 15 2015 Jakub Jelen <jjelen at redhat.com<mailto:jjelen at redhat.com>> 6.6.1p1-15 + 0.9.3-9
> ...
> - Fix race condition with auditing messages answers (#1240613)
>
> [2] - https://access.redhat.com/solutions/1521923
>
> [3] from /var/log/secure
> Apr 26 10:42:58 host01 sshd[15138]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 2
> Apr 26 10:42:58 host01 sshd[15138]: error: buffer_get_string_ret: buffer_get failed
> Apr 26 10:42:58 host01 sshd[15138]: fatal: buffer_get_string: buffer error
> <snip />
> Apr 26 10:56:58 host01 sshd[15293]: fatal: mm_request_receive: read: bad msg_len 4096768
> Apr 26 10:56:58 host01 sshd[15293]: pam_unix(sshd:session): session closed for user targetuser
> Apr 26 10:56:58 host01 sshd[15296]: fatal: mm_request_receive: read: Connection reset by peer
> Apr 26 10:56:58 host01 sshd[15296]: fatal: mm_request_send: write: Broken pipe
> <snip />
> Apr 26 10:58:58 host01 sshd[16028]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 53
> Apr 26 10:58:58 host01 sshd[16028]: error: buffer_get_string_ret: buffer_get failed
> Apr 26 10:58:58 host01 sshd[16028]: fatal: buffer_get_string: buffer error
> Apr 26 10:58:58 host01 sshd[16028]: pam_unix(sshd:session): session closed for user targetuser
> Apr 26 10:58:58 host01 sshd[16032]: fatal: mm_request_receive: read: Connection reset by peer
> Apr 26 10:58:58 host01 sshd[16032]: fatal: mm_request_send: write: Broken pipe
> <snip />
> Apr 26 11:07:58 host01 sshd[16149]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 50
> Apr 26 11:07:58 host01 sshd[16149]: error: buffer_get_string_ret: buffer_get failed
> Apr 26 11:07:58 host01 sshd[16149]: fatal: buffer_get_string: buffer error
> Apr 26 11:07:58 host01 sshd[16149]: pam_unix(sshd:session): session closed for user targetuser
> Apr 26 11:07:58 host01 sshd[16152]: fatal: mm_request_receive: read: Connection reset by peer
> Apr 26 11:07:58 host01 sshd[16152]: fatal: mm_request_send: write: Broken pipe
> <snip />
> Apr 26 11:12:58 host01 sshd[17475]: fatal: mm_request_receive: read: bad msg_len 4529920
> Apr 26 11:12:58 host01 sshd[17475]: pam_unix(sshd:session): session closed for user targetuser
> Apr 26 11:12:58 host01 sshd[17501]: fatal: mm_request_receive: read: Connection reset by peer
> Apr 26 11:12:58 host01 sshd[17501]: fatal: mm_request_send: write: Broken pipe
>
>
>
> [4]
> time /usr/sbin/sshd -p 1026 -ddd
> <snip />
>
> debug2: fd 19 setting O_NONBLOCK
> debug2: fd 11 setting O_NONBLOCK
> debug2: fd 22 setting O_NONBLOCK
> debug2: channel 4: rcvd eof
> debug2: channel 4: output open -> drain
> debug2: channel 4: obuf empty
> debug2: channel 4: close_write
> debug2: channel 4: output drain -> closed
> debug2: channel 3: read 664 from efd 20
> debug2: channel 4: read 664 from efd 22
> debug2: channel 3: rwin 2097152 elen 664 euse 1
> debug2: channel 3: sent ext data 664
> debug2: channel 4: rwin 2097152 elen 664 euse 1
> debug2: channel 4: sent ext data 664
> debug3: mm_request_receive entering
> debug3: monitor_read: checking request 124
> buffer_get_ret: trying to get more bytes 68 than in buffer 63             <----*******
> buffer_get_string_ret: buffer_get failed                                  <----*******
> buffer_get_string: buffer error                                           <----*******
> debug1: do_cleanup
> debug1: PAM: cleanup
> debug1: PAM: closing session
> debug1: PAM: deleting credentials
> debug3: PAM: sshpam_thread_cleanup entering
>
> real    8m4.228s
> user    0m0.506s
> sys     0m1.884s
> [root at rhel ~]# debug1: server_input_channel_open: ctype session rchan 12 win 2097152 max 32768
> debug1: input_session_request
> debug1: channel 5: new [server-session]
> debug1: session_new: session 7
> debug1: session_open: channel 5
> debug1: session_open: session 7: link with channel 5
> debug1: server_input_channel_open: confirm session
> debug1: server_input_channel_req: channel 5 request env reply 0
> debug1: session_by_channel: session 7 channel 5
> debug1: session_input_channel_req: session 7 req env
> debug2: Setting env 0: LANG=en_AU.UTF-8
> debug1: server_input_channel_req: channel 5 request exec reply 1
> debug1: session_by_channel: session 7 channel 5
> debug1: session_input_channel_req: session 7 req exec
> Starting session: command for joe from ::1 port 41194
> debug3: mm_audit_run_command entering command sleep .2
> debug3: mm_request_send entering: type 114
> mm_request_send: write: Broken pipe
> debug1: do_cleanup
> debug3: PAM: sshpam_thread_cleanup entering
> debug3: mm_request_send entering: type 124
> mm_request_send: write: Broken pipe
> [root at rhel ~]#
>
>
>
> [5]
> while true; do
>          runCommand .22
>          runCommand .2
>          runCommand .2
>          runCommand .2
>          runCommand .2
>          runCommand .2
>          sleep .333
> done
>
> [6]
> while true; do
>         runCommand 3
>         sleep .6
>         runCommand 3
>         sleep .6
>         runCommand 3
>         sleep .6
>         runCommand 3
>         sleep .6
>         runCommand 3
>         sleep .7
> done
>
>
>
> [7]
> while true; do
>      runCommand 3
>      runCommand 1
>      runCommand 1
>      runCommand 1
>      runCommand 1
>      runCommand 2
>      runCommand 2
>      runCommand 2
>      sleep 1
>      runCommand 1
>      sleep 1
> done
>
>
>
> [8]
> while true; do
>      runCommand 1
>      runCommand 1
>      runCommand 1
>      runCommand 1
>      sleep 1
>      runCommand 1
>      sleep 1
> done
>
>
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

-- 
Jakub Jelen
Associate Software Engineer
Security Technologies
Red Hat



More information about the openssh-unix-dev mailing list