ControlMaster failure of RH/CentOS 6.6.1p1-25

Nico Kadel-Garcia nkadel at gmail.com
Tue Apr 26 21:51:52 AEST 2016


On Mon, Apr 25, 2016 at 11:52 PM, Mattson, Robert <RMATTSON at harris.com> wrote:
> Dear OpenSSH Team,
>
>
>
> We use OpenSSH ControlPath/multiplexed sessions heavily to avoid session handshake overheads in a system of CentOS machines performing remote shell commands. The commands are run at different times and take different amounts of time to complete. We're currently experiencing a failure of sshd which relies on timing to trigger a fault and elicit a failure. The failure is typically a process termination. Depending on the loop used to reproduce a crash, the time required can be between five minutes and one hour (approx).

Not sure of why the issue occurs, but for *anything* that connects
OpenSSH to a lot of distinct targets, make sure you turn off the the
reverse DNS looks on every server you connect to, This means changing
the init script options to use "sshd -u0" in the SSH init scripts on
every remote server. For RHEL 6, it means putting "OPTIONS=-u0" in
/etc/sysconfig/sshd. Might not solve your issues, but it's really
helpful to reduce connection delays and improve performance.

The other weaselish way to achieve what you're doing is to run an "ssh
bash" command, connected to a pipe on the local side, and send your
commands through a pipe to the remote bash. It's been.... 15 years
since I last pulled that one, but you may find it helpful. Protecting
and terminating the pipe well is an adventure in local security, but
it could help eliminate the reliance on a feature that is probably not
well tested for RHEL.

Another option, for testing, is to compile OpenSSH locally, say in
/opt/openssh/7.1, and run a much more recent client to see if you can
reproduce the error. You caould also test and run a separate OpenSSH
daemon on a different port on a few target servers, ideally with
locked down privileges, segregated this way to prevent interference
with the default OpenSSH. It can even be built as an RPM, though I'd
be cautious about replacing the system standard version of OpenSSH.




> On CentOS, failure messages look like [3], on RHEL, reproduced failure messages look like [4].
>
>
>
> Previously RedHat supported and fixed a ~very~ similar issue (see ChangeLog entry[1]), however it appears we still see sshd crashes in our system on both CentOS and RHEL (which we know are compiled from the same source code) when using this feature. We are running the latest RPMS for OpenSSH - 6.6.1p1-25.el7_2.x86_64.rpm on both the client and server hosts.
>
> When investigating the issue previously, RedHat documented the method we used to reproduce the failure in some detail[2].
> I do swap the while loop described on that page for a different loop which is can elicit a failure sooner.
>
> The loop used to produce the error in [4] was [5], while I have also written/used [6],[7],[8]. There is probably an infinite number of possible loops.
>
>
>
> I tried to reproduce this in OpenBSD last May/June and did not see this failure, however, because of the issues around timing, not seeing a failure does not give me confidence there is no fault present.
>
>
>
> Is there any guidance available on how this can be resolved?
> ControlMaster is an important and very good feature of OpenSSH without which our remote execution system would not be nearly as efficient or scalable.
>
>
>
> It is my intention to open a case with RH but sometimes it is difficult to progress such support cases, I'm hoping I can get some assistance to help get this stabilised from wherever I can find it.
>
> Sincerely,
>
> Rob Mattson
>
>
>
>
> Dr Robert Mattson
> Product Development
> 380 St Kilda Road,
> Melbourne, Victoria, 3004
> Australia
>
> +61 3 9926 0000  phone
> robert.mattson at harris.com<mailto:robert.mattson at harris.com>
> www.c4i.com<http://www.c4i.com/>
>
> [1] rpm -q --changelog openssh
> * Wed Jul 15 2015 Jakub Jelen <jjelen at redhat.com<mailto:jjelen at redhat.com>> 6.6.1p1-15 + 0.9.3-9
> ...
> - Fix race condition with auditing messages answers (#1240613)
>
> [2] - https://access.redhat.com/solutions/1521923
>
> [3] from /var/log/secure
> Apr 26 10:42:58 host01 sshd[15138]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 2
> Apr 26 10:42:58 host01 sshd[15138]: error: buffer_get_string_ret: buffer_get failed
> Apr 26 10:42:58 host01 sshd[15138]: fatal: buffer_get_string: buffer error
> <snip />
> Apr 26 10:56:58 host01 sshd[15293]: fatal: mm_request_receive: read: bad msg_len 4096768
> Apr 26 10:56:58 host01 sshd[15293]: pam_unix(sshd:session): session closed for user targetuser
> Apr 26 10:56:58 host01 sshd[15296]: fatal: mm_request_receive: read: Connection reset by peer
> Apr 26 10:56:58 host01 sshd[15296]: fatal: mm_request_send: write: Broken pipe
> <snip />
> Apr 26 10:58:58 host01 sshd[16028]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 53
> Apr 26 10:58:58 host01 sshd[16028]: error: buffer_get_string_ret: buffer_get failed
> Apr 26 10:58:58 host01 sshd[16028]: fatal: buffer_get_string: buffer error
> Apr 26 10:58:58 host01 sshd[16028]: pam_unix(sshd:session): session closed for user targetuser
> Apr 26 10:58:58 host01 sshd[16032]: fatal: mm_request_receive: read: Connection reset by peer
> Apr 26 10:58:58 host01 sshd[16032]: fatal: mm_request_send: write: Broken pipe
> <snip />
> Apr 26 11:07:58 host01 sshd[16149]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 50
> Apr 26 11:07:58 host01 sshd[16149]: error: buffer_get_string_ret: buffer_get failed
> Apr 26 11:07:58 host01 sshd[16149]: fatal: buffer_get_string: buffer error
> Apr 26 11:07:58 host01 sshd[16149]: pam_unix(sshd:session): session closed for user targetuser
> Apr 26 11:07:58 host01 sshd[16152]: fatal: mm_request_receive: read: Connection reset by peer
> Apr 26 11:07:58 host01 sshd[16152]: fatal: mm_request_send: write: Broken pipe
> <snip />
> Apr 26 11:12:58 host01 sshd[17475]: fatal: mm_request_receive: read: bad msg_len 4529920
> Apr 26 11:12:58 host01 sshd[17475]: pam_unix(sshd:session): session closed for user targetuser
> Apr 26 11:12:58 host01 sshd[17501]: fatal: mm_request_receive: read: Connection reset by peer
> Apr 26 11:12:58 host01 sshd[17501]: fatal: mm_request_send: write: Broken pipe
>
>
>
> [4]
> time /usr/sbin/sshd -p 1026 -ddd
> <snip />
>
> debug2: fd 19 setting O_NONBLOCK
> debug2: fd 11 setting O_NONBLOCK
> debug2: fd 22 setting O_NONBLOCK
> debug2: channel 4: rcvd eof
> debug2: channel 4: output open -> drain
> debug2: channel 4: obuf empty
> debug2: channel 4: close_write
> debug2: channel 4: output drain -> closed
> debug2: channel 3: read 664 from efd 20
> debug2: channel 4: read 664 from efd 22
> debug2: channel 3: rwin 2097152 elen 664 euse 1
> debug2: channel 3: sent ext data 664
> debug2: channel 4: rwin 2097152 elen 664 euse 1
> debug2: channel 4: sent ext data 664
> debug3: mm_request_receive entering
> debug3: monitor_read: checking request 124
> buffer_get_ret: trying to get more bytes 68 than in buffer 63             <----*******
> buffer_get_string_ret: buffer_get failed                                  <----*******
> buffer_get_string: buffer error                                           <----*******
> debug1: do_cleanup
> debug1: PAM: cleanup
> debug1: PAM: closing session
> debug1: PAM: deleting credentials
> debug3: PAM: sshpam_thread_cleanup entering
>
> real    8m4.228s
> user    0m0.506s
> sys     0m1.884s
> [root at rhel ~]# debug1: server_input_channel_open: ctype session rchan 12 win 2097152 max 32768
> debug1: input_session_request
> debug1: channel 5: new [server-session]
> debug1: session_new: session 7
> debug1: session_open: channel 5
> debug1: session_open: session 7: link with channel 5
> debug1: server_input_channel_open: confirm session
> debug1: server_input_channel_req: channel 5 request env reply 0
> debug1: session_by_channel: session 7 channel 5
> debug1: session_input_channel_req: session 7 req env
> debug2: Setting env 0: LANG=en_AU.UTF-8
> debug1: server_input_channel_req: channel 5 request exec reply 1
> debug1: session_by_channel: session 7 channel 5
> debug1: session_input_channel_req: session 7 req exec
> Starting session: command for joe from ::1 port 41194
> debug3: mm_audit_run_command entering command sleep .2
> debug3: mm_request_send entering: type 114
> mm_request_send: write: Broken pipe
> debug1: do_cleanup
> debug3: PAM: sshpam_thread_cleanup entering
> debug3: mm_request_send entering: type 124
> mm_request_send: write: Broken pipe
> [root at rhel ~]#
>
>
>
> [5]
> while true; do
>         runCommand .22
>         runCommand .2
>         runCommand .2
>         runCommand .2
>         runCommand .2
>         runCommand .2
>         sleep .333
> done
>
> [6]
> while true; do
>        runCommand 3
>        sleep .6
>        runCommand 3
>        sleep .6
>        runCommand 3
>        sleep .6
>        runCommand 3
>        sleep .6
>        runCommand 3
>        sleep .7
> done
>
>
>
> [7]
> while true; do
>     runCommand 3
>     runCommand 1
>     runCommand 1
>     runCommand 1
>     runCommand 1
>     runCommand 2
>     runCommand 2
>     runCommand 2
>     sleep 1
>     runCommand 1
>     sleep 1
> done
>
>
>
> [8]
> while true; do
>     runCommand 1
>     runCommand 1
>     runCommand 1
>     runCommand 1
>     sleep 1
>     runCommand 1
>     sleep 1
> done
>
>
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev


More information about the openssh-unix-dev mailing list