[Bug 3639] server thread aborts during client login after receiving SSH2_MSG_KEXINIT due to ssh_sandbox_violation

bugzilla-daemon at mindrot.org bugzilla-daemon at mindrot.org
Mon Dec 18 18:00:16 AEDT 2023


https://bugzilla.mindrot.org/show_bug.cgi?id=3639

--- Comment #26 from JM <jtm.moon.forum.user+mindrot at gmail.com> ---
tl;dr a seccomp sandbox violation `20` occurs from a `read` (still).
      This is just a more detailed retelling of what was previously
discussed.
      Scroll to end for thoughts...

### problem specifics

Failed `read` in parent process

    read(fd, s + pos, n - pos);

which is

    read(5, '\014\221b', 4);

returns `0`.

Failed `read` will cause the following audit event (journalctl -f -x)

    Dec 17 15:59:35 host1 kernel: audit: type=1326
audit(1702857575.824:3180): auid=4294967295 uid=107 gid=65534
ses=4294967295 pid=1920 comm="sshd"
exe="/root/Projects/openssh-9.2p1-WIP/sshd" sig=31 arch=40000028
syscall=20 compat=1 ip=0xf7afd80c code=0x0

And the same when compiled with
`CFLAGS="-DDSANDBOX_SECCOMP_FILTER_DEBUG"`

    Dec 17 22:37:50 pifuboo audit[10678]: SECCOMP auid=4294967295
uid=107 gid=65534 ses=4294967295 pid=10678 comm="sshd"
exe="/root/Projects/openssh-9.2p1-WIP/sshd" sig=31 arch=40000028
syscall=20 compat=1 ip=0xf77de80c code=0x0
    Dec 17 22:37:50 pifuboo audit[10678]: ANOM_ABEND auid=4294967295
uid=107 gid=65534 ses=4294967295 pid=10678 comm="sshd"
exe="/root/Projects/openssh-9.2p1-WIP/sshd" sig=31 res=1

The failed linux system call `20` is `epoll_create1` according to
`ausyscall`

    $ ausyscall 20
    epoll_create1

So the `read` at some point calls syscall `20`. See section "Summary
Thoughts" about this.

Here is the failed `read` call
https://github.com/openssh/openssh-portable/blob/V_9_2_P1/atomicio.c#L66
It is always the `read` call with values `fd=5`, `n=4`.
`read` returns `0`.
`errno` is not changed after `read` returns.

The stack just before the failed `read` call is:

    #1  0x004701f8 in atomicio6
        (f=f at entry=0xf7c65478 <read>, fd=fd at entry=5, _s=0xfffeead8,
_s at entry=0xfffeead0, n=n at entry=4, cb=cb at entry=0x0,
cb_arg=cb_arg at entry=0x0)
        at atomicio.c:67
    #2  0x00470284 in atomicio
        (f=f at entry=0xf7c65478 <read>, fd=fd at entry=5,
_s=_s at entry=0xfffeead0, n=n at entry=4)
        at atomicio.c:101
    #3  0x00434520 in mm_request_receive
        (sock=5, m=m at entry=0x4f2b88)
        at monitor_wrap.c:149
    #4  0x00431178 in monitor_read
        (ssh=ssh at entry=0x4f3388, pmonitor=pmonitor at entry=0x4f2498,
ent=0x4e0114 <mon_dispatch_proto20>, pent=pent at entry=0xfffeeb78)
        at monitor.c:501
    #5  0x00433b5c in monitor_child_preauth
        (ssh=ssh at entry=0x4f3388, pmonitor=0x4f2498)
        at monitor.c:301
    #6  0x0040a388 in privsep_preauth
        (ssh=0x4f3388)
        at sshd.c:502
    #7  main
        (ac=<optimized out>, av=0x4e31a0)
        at sshd.c:2240

(line numbers in `atomicio.c` slightly off due to insertion of
`raise(SIGINT);`)

The debug log from the start of client connection is:

    debug3: fd 5 is not O_NONBLOCK
    debug1: Server will not fork when running in debugging mode.
    debug3: send_rexec_state: entering fd = 8 config len 3296
    debug3: ssh_msg_send: type 0
    debug3: send_rexec_state: done
    debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8
    process 32277 is executing new program:
/root/Projects/openssh-9.2p1/sshd
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library
"/lib/arm-linux-gnueabihf/libthread_db.so.1".
    debug3: recv_rexec_state: entering fd = 5
    debug3: ssh_msg_recv entering
    debug3: recv_rexec_state: done
    debug2: parse_server_config_depth: config rexec len 3296
    debug3: rexec:14 setting Port 55222
    debug3: rexec:22 setting HostKey
/root/Projects/openssh-9.2p1/ssh_host_ecdsa_key
    debug3: rexec:23 setting HostKey
/root/Projects/openssh-9.2p1/ssh_host_ed25519_key
    debug3: rexec:24 setting HostKey
/root/Projects/openssh-9.2p1/ssh_host_rsa_key
    debug3: rexec:45 setting AuthorizedKeysFile .ssh/authorized_keys
    debug3: rexec:113 setting Subsystem sftp       
/usr/libexec/sftp-server
    debug1: sshd version OpenSSH_9.2, OpenSSL 1.1.1w  11 Sep 2023
    debug1: private host key #0: ecdsa-sha2-nistp256
SHA256:7OUDaY7vmsaJPDkqGWPmdiw5kjY4bVSwCd94nJqT7/o
    debug1: private host key #1: ssh-ed25519
SHA256:CuPO+bnbHMCkaNEybTHeYSjdNpiNdAlntO9gh0V9lxs
    debug1: private host key #2: ssh-rsa
SHA256:ZYZLLhbWdOMFKDGw3pcn954Wz6RhwtDoI5WjJsZpXhk
    debug1: inetd sockets after dupping: 3, 3
    debug3: process_channel_timeouts: setting 0 timeouts
    debug3: channel_clear_timeouts: clearing
    Connection from 192.168.124.214 port 57930 on 192.168.124.214 port
55222 rdomain ""
    debug1: Local version string SSH-2.0-OpenSSH_9.2
    debug1: Remote protocol version 2.0, remote software version
OpenSSH_8.4p1 Raspbian-5+deb11u2
    debug1: compat_banner: match: OpenSSH_8.4p1 Raspbian-5+deb11u2 pat
OpenSSH* compat 0x04000000
    debug2: fd 3 setting O_NONBLOCK
    debug3: ssh_sandbox_init: preparing seccomp filter sandbox
    [Detaching after fork from child process 32308]
    debug2: Network child is on pid 32308
    debug3: preauth child monitor started
    debug3: privsep user:group 107:65534 [preauth]
    debug1: permanently_set_uid: 107/65534 [preauth]
    debug3: ssh_sandbox_child: setting PR_SET_NO_NEW_PRIVS [preauth]
    debug3: ssh_sandbox_child: attaching seccomp filter program
[preauth]
    debug1: monitor_read_log: child log fd closed
    debug3: mm_request_receive: entering

Dumping `/proc/$parentpid/status` just before the `read` failure shows:

    Seccomp:        0
    Seccomp_filters:        0

Dumping `/proc/$childpid/status` just before the `read` failure shows:

    Seccomp:        3
    Seccomp_filters:        1

File descriptor 5 of the parent process is a STREAM (according to
`lsof`)

    $ lsof -p 11715
    COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF    NODE
NAME
    sshd    11715 root  cwd    DIR              179,2     4096       2
/
    sshd    11715 root  rtd    DIR              179,2     4096       2
/
    sshd    11715 root  txt    REG              179,2  1318404   72912
/root/Projects/openssh-9.2p1-WIP/sshd
    sshd    11715 root  mem    REG              179,2    42628    2737
/lib/arm-linux-gnueabihf/libnss_files-2.31.so
    sshd    11715 root  mem    REG              179,2   116324    4913
/lib/arm-linux-gnueabihf/libgcc_s.so.1
    sshd    11715 root  mem    REG              179,2   137364    2748
/lib/arm-linux-gnueabihf/libpthread-2.31.so
    sshd    11715 root  mem    REG              179,2    13864    2668
/lib/arm-linux-gnueabihf/libdl-2.31.so
    sshd    11715 root  mem    REG              179,2  1315688    2667
/lib/arm-linux-gnueabihf/libc-2.31.so
    sshd    11715 root  mem    REG              179,2    95880   11965
/lib/arm-linux-gnueabihf/libz.so.1.2.11
    sshd    11715 root  mem    REG              179,2  2150824   11138
/usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
    sshd    11715 root  mem    REG              179,2     9796    2753
/lib/arm-linux-gnueabihf/libutil-2.31.so
    sshd    11715 root  mem    REG              179,2   210340    1499
/lib/arm-linux-gnueabihf/libcrypt.so.1.1.0
    sshd    11715 root  mem    REG              179,2    17708   14672
/usr/lib/arm-linux-gnueabihf/libarmmem-v7l.so
    sshd    11715 root  mem    REG              179,2   146888    2655
/lib/arm-linux-gnueabihf/ld-2.31.so
    sshd    11715 root    0u   CHR                1,3      0t0       5
/dev/null
    sshd    11715 root    1u   CHR                1,3      0t0       5
/dev/null
    sshd    11715 root    2u   CHR              136,4      0t0       7
/dev/pts/4
    sshd    11715 root    3u  IPv4            8165255      0t0     TCP
localhost:55522->localhost:48024 (ESTABLISHED)
    sshd    11715 root    5u  unix 0x00000000c618f590      0t0 8165266
type=STREAM

### Other miscellaneous observations:

* the child process quickly becomes "defunct"

Oddly, I can see that a child process is created by debug-printing PIDs
at certain points. e.g. a debug log message prints

    debug2: Network child is on pid 11719

But later, just before the failed `read`, that child process is in a
"defunct" state. e.g. command `ps -ef` shows

    $ ps -ef
    ...
    sshd: [accepted]
    [sshd] <defunct>

I suspect the child process is immediately dying and the later parent
process `read` then fails.

* Years ago on this same system, I locally built 8.4p1, 8.6p1, 9.0p1,
that have run just fine.
  8.4p1 was built Feb 2021
  8.6p1 was built Jun 2022
  9.0p1 was built Jun 2022

  Yet, I downloaded those same old versions today and they failed. Each
hit the same child process abort.

* I verified the address of `f` is function `read`
(https://github.com/openssh/openssh-portable/blob/V_9_2_P1/atomicio.c#L66)
  with code snippet:

      if (f == read) {
         debug3("read(%d, '%s', %u); (errno=%u)", fd, s + pos, n - pos,
errno);
      } else {
         debug3("(f=%p) (%d, '%s', %u); (errno=%u)", f, fd, s + pos, n
- pos, errno);
      }

> Could you add try adding a similar printf+getpid+exit

* verified within the `sshd` process, `__NR_epoll_create1 = 357` and
`__NR_getpid = 20` via `debug3` prints, e.g. code

    debug3("__NR_getpid = %d", __NR_getpid);
    debug3("__NR_epoll_create1 = %d", __NR_epoll_create1);
    int _pid = getpid();
    debug3("getpid() = %d", _pid);

* built with `./configure --with-sandbox=no` and it runs okay (no child
process aborts)

* other sandboxes failed to compile due to missing headers or kernel
capabilities (and I didn't feel like chasing these down)
  * systrace
  * rlimit
  * capsicum

* Various `fcntl` `GET` checks of file descriptor 5. `errno` was set to
`0` before each call to `fcntl`.

    fcntl(5, F_GETFD) returned 1 (0x00000001) (errno=0)
    fcntl(5, F_GETOWN_EX) returned 0 (0x00000000) (errno=0)
owner.type=0, owner.pid=0
    fcntl(5, F_GETOWN) returned 0 (0x00000000) (errno=0)
    fcntl(5, F_GETPIPE_SZ) returned -1 (0xffffffff) (errno=9)
    fcntl(5, F_GET_SEALS) returned -1 (0xffffffff) (errno=22)
    fcntl(5, F_GETLEASE ) returned 2 (0x00000002) (errno=0)

* for posterity, if anyone else can repro this,
  then manually add this code in `atomicio.c` function `atomicio6` to
cause a GDB break:

    if (fd == 5 && n == 4 && pos == 0 && errno == 32) {
        raise(SIGINT);
    }

  Those are the happenstance values before the failed `read` call.
  Add the prior snippet just before code:

      res = (f) (fd, s + pos, n - pos);

   gdb command:

       $ gdb --args "$(realpath .)/sshd" -ddddd -f sshd_config

### Summary Thoughts

> So perhaps the problem here is that either it's picking up 32bit vs 64bit headers, or that the binary is some kind of 32bit compatibility mode but the sandbox is expecting the 64bit syscalls.

Looking at the above `grep -r -Ee '__NR_getpid|__NR_epoll_create1'`,
maybe the resident compiled libc
used header `/usr/include/asm-generic/unistd.h` where `#define
__NR_epoll_create1 20`,
and openssh-server builds is uses header
`/usr/include/arm-linux-gnueabihf/asm/unistd-eabi.h`
where `#define __NR_getpid (__NR_SYSCALL_BASE + 20)` (or vice-versa).

Looking at the files

    $ l /usr/include/arm-linux-gnueabihf/asm/unistd-eabi.h
    -rw-r--r-- 1 root root 19938 Apr  5  2023
/usr/include/arm-linux-gnueabihf/asm/unistd-eabi.h

    $ l /usr/include/asm-generic/unistd.h
    -rw-r--r-- 1 root root 31480 Apr  5  2023
/usr/include/asm-generic/unistd.h

Looking at the compiled `libc`

    $ find /usr -name 'libc.so'
    /usr/lib/arm-linux-gnueabihf/libc.so

    $ ls -l /usr/lib/arm-linux-gnueabihf/libc.so
    -rw-r--r-- 1 root root 289 Oct  3 12:55
/usr/lib/arm-linux-gnueabihf/libc.so

So maybe my include or library pathing is was reconfigured (messed up)
some time in October 2023 (oh man, how screwed am I?)

### in the meantime ...

I have a workaround using `--with-sandbox=no`.

If you'd like me to try something else then please let me know.
Otherwise, I've spent a fair amount of time in this rabbit hole and
need to get back to other things (i.e. other rabbit holes ��).


-James Moon
(https://github.com/jtmoon79/)
(https://twitter.com/jtmoon1979/)

-- 
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.


More information about the openssh-bugs mailing list