9.3p1 Daemon Rejects Client Connections on armv7l-dey-linux-gnueabihf w/ GCC 10/11/12

Grant Erickson gerickson at nuovations.com
Tue Oct 31 17:52:35 AEDT 2023


I have an NXP i.MX6-based armv7l-dey-linux-gnueabihf system in which I
am seeing some as-yet-unaccountable behavior in sshd when compiled with
Arm/GCC 10/11/12. That is, when attempting to scp/slogin/ssh to
'root@<host>', where <host> is either a name or IPv4 or IPv6 address,
the connection is quickly closed by the server without prompting for a
password.

The variable I can consistently change across all others to impact
whether things work or do not work is the toolchain. Under the
arm-dey-linux-gnueabi-gcc 8.2.0 from Digi Embedded Yocto (DEY),
scp/slogin/ssh works. Under arm-none-linux-gnueabihf-gcc 10/11/12
(specifically those from https://developer.arm.com/-/media/Files/downloads/gnu-a/10.3-2021.07/binrel/gcc-arm-10.3-2021.07-x86_64-arm-none-linux-gnueabihf.tar.xz, https://developer.arm.com/-/media/Files/downloads/gnu/11.3.rel1/binrel/arm-gnu-toolchain-11.3.rel1-x86_64-arm-none-linux-gnueabihf.tar.xz, and https://developer.arm.com/-/media/Files/downloads/gnu/12.3.rel1/binrel/arm-gnu-toolchain-12.3.rel1-x86_64-arm-none-linux-gnueabihf.tar.xz) they do not, failing consistently and with the same failure across the three of them.

The original version of openssh under which this was observed was 9.3p1,
configured as follows:

    ${BuildRoot}/third_party/openssh/openssh-9.3p1/configure -C \
        AR="${AR}" CPP="${CPP}" CC="${CC}" CXX="${CXX}" RANLIB="${RANLIB}" STRIP="${STRIP}" \
        CPPFLAGS="--sysroot=${SYSROOT} -mcpu=cortex-a8 -mfloat-abi=hard -mfpu=neon -isystem ${SYSROOT}/usr/include -I${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/ncurses/usr/include -I${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/openssl/usr/include" \
        CFLAGS="--sysroot=${SYSROOT} -mcpu=cortex-a8 -mfloat-abi=hard -mfpu=neon -fno-omit-frame-pointer -fno-strict-aliasing" \
        LDFLAGS="--sysroot=${SYSROOT} -L${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/ncurses/usr/lib/ -L${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/libedit/usr/lib/ -Wl,-rpath-link -Wl,${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/ncurses/usr/lib -Wl,-rpath-link -Wl,${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/zlib/usr/lib" \
        --build=x86_64-pc-linux-gnu \
        --host=arm-dey-linux-gnueabi \
        --target=arm-dey-linux-gnueabi \
        --disable-strip \
        --with-hardening \
        --with-libedit="${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/libedit/usr" \
        --with-mantype=cat \
        --with-openssl \
        --with-pid-dir=/var/run \
        --with-privsep-path=/var/run/sshd \
        --with-ssl-dir="${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/openssl/usr" \
        --with-stackprotect \
        --with-zlib-version-check \
        --with-zlib="${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/zlib/usr" \
        --without-kerberos5 \
        --without-ldns \
        --without-maildir \
        --without-pam \
        --without-rpath \
        --without-selinux \
        --without-xauth \
        --prefix=/usr \
        --sysconfdir=/etc/ssh \
        --localstatedir=/var

Were it just one version, I’d have expected a potential code generation
bug with the compiler; however, across three different versions from
three different GCC eras, I’m inclined to believe this isn’t a code-
generation issue.

In all failures, the ssh client fails with:

    debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

followed by:

    Connection closed by <IP address of server> port 22

In all failures, the ssh daemon fails with:

    debug1: expecting SSH2_MSG_KEX_ECDH_INIT [preauth]
    debug3: receive packet: type 30 [preauth]
    debug3: mm_sshkey_sign entering [preauth]
    debug3: mm_request_send entering: type 6 [preauth]
    debug3: mm_sshkey_sign: waiting for MONITOR_ANS_SIGN [preauth]
    debug3: mm_request_receive_expect entering: type 7 [preauth]
    debug3: mm_request_receive entering [preauth]
    debug3: mm_request_receive entering
    debug3: monitor_read: checking request 6
    debug3: mm_answer_sign
    debug3: mm_answer_sign: hostkey proof signature 0x1164880(100)
    debug3: mm_request_send entering: type 7
    debug2: monitor_read: 6 used once, disabling now
    debug3: send packet: type 31 [preauth]
    debug3: send packet: type 21 [preauth]
    debug2: set_newkeys: mode 1 [preauth]
    debug1: rekey after 134217728 blocks [preauth]
    debug1: monitor_read_log: child log fd closed
    debug3: mm_request_receive entering
    debug1: do_cleanup
    debug1: Killing privsep child 2544

My first inclination was that this was a SHA-1 key algorithm deprecation
issue; however, I verified that was not the case. And, again, the fact
that the compiler is the only variable indicated it likely was not.

My second inclination was that this was perhaps an optimization issue
with the later versions of GCC, so I compiled OpenSSH with -O0. No
change. Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not.

My next inclination was to try a different ssh client. I’d been using
8.2p1 (Ubuntu 20.04); however, 8.1p1 and 8.6p1 (macOS) as well as a
locally-built 9.5p1 yielded the same results: Digi DEY 8.2.0 works;
Arm GNU Toolchain 10/11/12 did not.

My next inclination was to iterate through sshd_config configuration.
I commented out the 10 lines one-by-one and retested which yielded the
same results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not.

My next inclination was that perhaps OpenSSL was creating an issue. I
tried 1.1.1w (up from my 1.1.1s) and 3.1.4 which yielded the same
results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not.

My next inclination was the perhaps it was OpenSSH version-specific. I
tried up revving to 9.5p1 and then down revving to 7.9p1 which yielded
the same results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did
not.

My last inclination was to do a side-by-side comparison of the
configuration and compilation output between Digi DEY 8.2.0 and Arm GNU
Toolchain 12. The key differences were checking:

    if ${CC} supports compile flag -fzero-call-used-regs=all
    if ${CC} supports compile flag -ftrivial-auto-var-init=zero
    for sys/sysctl.h
    for library containing login
    for closefrom
    for close_range
    for library containing dlopen
    for arc4random
    for arc4random_buf
    for arc4random_uniform
    if libc defines sys_errlist
    if libc defines sys_nerr
    for library containing res_query
    for library containing dn_expand
    if res_query will link
    for _getshort
    for _getlong

While most of these configuration difference seem trivial and innocuous,
the -fzero-call-used-regs=all and -ftrivial-auto-var-init=zero compiler
language / code generation options seemed the most likely among those
differences to impact the point at which the client/daemon interaction
seemed to be failing. So, I forcibly disabled both which yielded the same
results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not.

Does anyone recognize this as a familiar failure mode? Beyond that, any
thoughts or recommendations on zeroing in further on the potential root
cause?

Best,

Grant


More information about the openssh-unix-dev mailing list