SSH 5.8p1 hang in kernel mode / AIX 7.1

Flavien Lebarbe flavien-ssh at lebarbe.net
Fri Dec 28 03:43:48 EST 2012


Hello,


Replying to self, for the archives.

This really looking like an AIX kernel bug that SSH is triggering
somehow (once every 5000/6000 runs here on a test machine). We have
an open issue with IBM on this. Here's the stack we got from kdb.
It's related to CLiC (CryptoLite for C kernel, a kernel extension).


Flavien.

0)> f
pvthread+01A000 STACK:
[F1000000C0339E84].RdTBR+000004 ()
[F1000000C02F9D64]CLiC__trng+000104 (F1000102135279A8)
[F1000000C02FA280]CLiC_rng_seed+0001A0 (F100010213527A80, 0000000000000000,
   0000000000000014)
[F1000000C02FA448]clic_ctxrng_init+000068 (F100010213527980, 0000000400000004)
[F1000000C02FA74C]CLiC_context+00018C (F10001021355B550, 0000000200000002,
   0000000400000004, F1000000C03C3A98, F1000000C03C3AB0)
[F1000000C036D2E8]P11_CLiC_app_init+000108 (F10001021355B458, F00000002FF45FC8)
[F1000000C02C5A9C]p11_init_crypto_ctx+00011C (F10001021355B458, F00000002FF45FC8)
[F1000000C02C5F28]p11_acquire_context+000268 (00000000011E00DC, 0000000100000001,
   F00000002FF45FC8)
[F1000000C02C567C]p11_dd_open+0000FC (8000002200000000, 0000000000000001,
   000BB003000BB003, 0000000000000000)
[00014D70].hkey_legacy_gate+00004C ()
[005769C0]devcopen+000480 (??, ??, ??, ??, ??)
[00576020]rdevopen+000140 (??, ??, ??, ??, ??)
[007E2D90]mpx_open+000070 (F10001020FE0D5F0, 0000000100000001,
   0000000000000000)
[00753E7C]spec_open+0000FC (??, ??, ??, ??, ??)
[005A44F8]vnop_open+0004F8 (??, ??, ??, ??, ??)
[0063FEAC]openpnp+0006EC (??, ??, ??, ??, ??, ??, ??, ??)
[0064056C]openpath+00028C (??, ??, ??, ??, ??, ??, ??, ??)
[00640934]copen+000314 (FFFFFFFEFFFFFFFE, 00000000F084A164,
   0000000000000000, 0000000800000008, 0000000000000000, F00000002FF47580)
[0063F744]kopen+000024 (??, ??, ??)
[0000386C]ovlya_addr_sc_flih_main+00014C ()
[D0119A54]open+0000F4 (F084A164, 00000000, 00000008, 00000001,
   11A000C5, 01A000C5, 00000000, F0731A54)
[D232F934]C_Initialize+000394 (00000000)
[D100DBAC]D100DBAC ()
[D100BA18]D100BA18 ()
[D100B9B0]D100B9B0 ()
[D10109CC]D10109CC ()
[D1009AEC]D1009AEC ()
[10060334]ssh_SSLeay_add_all_algorithms+000014 ()
[100032F0]main+001290 (00000001, 2FF22800)
[100001E8]__start+000098 ()



Flavien Lebarbe wrote :
> Hello,
> 
> 
> 
> An AIX machine runs a program that forks ssh client in order to
> launch commands on a remote. I'm first seting up a Master connection
> with a ControlPath, then using that connection to launch various
> commands on the remote, and killing the master by issuing a
> "-O exit" command.
> 
> SSH client version on that machine is :
> # ssh -V       
> OpenSSH_5.8p1, OpenSSL 0.9.8r 8 Feb 2011
> # uname -nrsv
> AIX P7_AIX7 1 7
> 
> The program runs every 5 minutes for about 10s or so, gathering the
> information from the remote just fine.
> 
> Now, I'm looking at the output of "ps" and see some left-over processes :
>     root  5832832        1  69   22 nov      - 5424:59 ssh -o BatchMode=yes -o ControlPath=/opt/data/ssh-socket_A-10.10.14.126 -o User=foobar 10.10.14.126 remote_command
> 
> This instance of ssh client should not be there anymore.
> 
> Having a deeper look:
> * kill -9 on that process does not kill it.
> * The corresponding ControlPath socket does not exist anymore on the system,
>   nor does the ssh master process for this socket.
> * truss on that process does not show any activity at all: the process is
>   apparently inside a system call.
> * kernel activity on the machine as reported by topas is 99%
> * ls -l /proc/5832832/fd only shows 3 FDs : 
>     # ls -l /proc/5832832/fd
>     0 total
>     c---------    1 root     system        2,  2 14 d� 11:22 0
>     p---------    0 root     system            0 22 nov 06:18 1
>     c---------    1 root     system        2,  2 14 d� 11:22 2
> 
> I have currently 6 of those processes running on this system. Some of them
> are running for weeks like the above. Others are running for days.
> 
> This situation looks like a kernel bug to me. Do you have any idea of
> anything that might be triggering it in the OpenSSH code in this old
> version of OpenSSH ?
> 
> 
> Thanks,
> Flavien.


More information about the openssh-unix-dev mailing list