SSH 5.8p1 hang in kernel mode / AIX 7.1

Flavien Lebarbe flavien-ssh at lebarbe.net
Fri Dec 14 22:18:21 EST 2012


Hello,



An AIX machine runs a program that forks ssh client in order to
launch commands on a remote. I'm first seting up a Master connection
with a ControlPath, then using that connection to launch various
commands on the remote, and killing the master by issuing a
"-O exit" command.

SSH client version on that machine is :
# ssh -V       
OpenSSH_5.8p1, OpenSSL 0.9.8r 8 Feb 2011
# uname -nrsv
AIX P7_AIX7 1 7

The program runs every 5 minutes for about 10s or so, gathering the
information from the remote just fine.

Now, I'm looking at the output of "ps" and see some left-over processes :
    root  5832832        1  69   22 nov      - 5424:59 ssh -o BatchMode=yes -o ControlPath=/opt/data/ssh-socket_A-10.10.14.126 -o User=foobar 10.10.14.126 remote_command

This instance of ssh client should not be there anymore.

Having a deeper look:
* kill -9 on that process does not kill it.
* The corresponding ControlPath socket does not exist anymore on the system,
  nor does the ssh master process for this socket.
* truss on that process does not show any activity at all: the process is
  apparently inside a system call.
* kernel activity on the machine as reported by topas is 99%
* ls -l /proc/5832832/fd only shows 3 FDs : 
    # ls -l /proc/5832832/fd
    0 total
    c---------    1 root     system        2,  2 14 d� 11:22 0
    p---------    0 root     system            0 22 nov 06:18 1
    c---------    1 root     system        2,  2 14 d� 11:22 2

I have currently 6 of those processes running on this system. Some of them
are running for weeks like the above. Others are running for days.

This situation looks like a kernel bug to me. Do you have any idea of
anything that might be triggering it in the OpenSSH code in this old
version of OpenSSH ?


Thanks,
Flavien.


More information about the openssh-unix-dev mailing list