SSH 5.8p1 hang in kernel mode / AIX 7.1

Kevin Brott kevin.brott at gmail.com
Sat Dec 29 13:38:23 EST 2012


IBM just released openssh 6.0 for AIX 6.1/7.1 (and lsof 4.85) in the last
week or so, upgrading might be an option.  OTOH - we've been running 5.8p1
on AIX 7100-01-xx for some time now without issues.


On Thu, Dec 27, 2012 at 8:43 AM, Flavien Lebarbe <flavien-ssh at lebarbe.net>wrote:

> Hello,
>
>
> Replying to self, for the archives.
>
> This really looking like an AIX kernel bug that SSH is triggering
> somehow (once every 5000/6000 runs here on a test machine). We have
> an open issue with IBM on this. Here's the stack we got from kdb.
> It's related to CLiC (CryptoLite for C kernel, a kernel extension).
>
>
> Flavien.
>
> 0)> f
> pvthread+01A000 STACK:
> [F1000000C0339E84].RdTBR+000004 ()
> [F1000000C02F9D64]CLiC__trng+000104 (F1000102135279A8)
> [F1000000C02FA280]CLiC_rng_seed+0001A0 (F100010213527A80, 0000000000000000,
>    0000000000000014)
> [F1000000C02FA448]clic_ctxrng_init+000068 (F100010213527980,
> 0000000400000004)
> [F1000000C02FA74C]CLiC_context+00018C (F10001021355B550, 0000000200000002,
>    0000000400000004, F1000000C03C3A98, F1000000C03C3AB0)
> [F1000000C036D2E8]P11_CLiC_app_init+000108 (F10001021355B458,
> F00000002FF45FC8)
> [F1000000C02C5A9C]p11_init_crypto_ctx+00011C (F10001021355B458,
> F00000002FF45FC8)
> [F1000000C02C5F28]p11_acquire_context+000268 (00000000011E00DC,
> 0000000100000001,
>    F00000002FF45FC8)
> [F1000000C02C567C]p11_dd_open+0000FC (8000002200000000, 0000000000000001,
>    000BB003000BB003, 0000000000000000)
> [00014D70].hkey_legacy_gate+00004C ()
> [005769C0]devcopen+000480 (??, ??, ??, ??, ??)
> [00576020]rdevopen+000140 (??, ??, ??, ??, ??)
> [007E2D90]mpx_open+000070 (F10001020FE0D5F0, 0000000100000001,
>    0000000000000000)
> [00753E7C]spec_open+0000FC (??, ??, ??, ??, ??)
> [005A44F8]vnop_open+0004F8 (??, ??, ??, ??, ??)
> [0063FEAC]openpnp+0006EC (??, ??, ??, ??, ??, ??, ??, ??)
> [0064056C]openpath+00028C (??, ??, ??, ??, ??, ??, ??, ??)
> [00640934]copen+000314 (FFFFFFFEFFFFFFFE, 00000000F084A164,
>    0000000000000000, 0000000800000008, 0000000000000000, F00000002FF47580)
> [0063F744]kopen+000024 (??, ??, ??)
> [0000386C]ovlya_addr_sc_flih_main+00014C ()
> [D0119A54]open+0000F4 (F084A164, 00000000, 00000008, 00000001,
>    11A000C5, 01A000C5, 00000000, F0731A54)
> [D232F934]C_Initialize+000394 (00000000)
> [D100DBAC]D100DBAC ()
> [D100BA18]D100BA18 ()
> [D100B9B0]D100B9B0 ()
> [D10109CC]D10109CC ()
> [D1009AEC]D1009AEC ()
> [10060334]ssh_SSLeay_add_all_algorithms+000014 ()
> [100032F0]main+001290 (00000001, 2FF22800)
> [100001E8]__start+000098 ()
>
>
>
> Flavien Lebarbe wrote :
> > Hello,
> >
> >
> >
> > An AIX machine runs a program that forks ssh client in order to
> > launch commands on a remote. I'm first seting up a Master connection
> > with a ControlPath, then using that connection to launch various
> > commands on the remote, and killing the master by issuing a
> > "-O exit" command.
> >
> > SSH client version on that machine is :
> > # ssh -V
> > OpenSSH_5.8p1, OpenSSL 0.9.8r 8 Feb 2011
> > # uname -nrsv
> > AIX P7_AIX7 1 7
> >
> > The program runs every 5 minutes for about 10s or so, gathering the
> > information from the remote just fine.
> >
> > Now, I'm looking at the output of "ps" and see some left-over processes :
> >     root  5832832        1  69   22 nov      - 5424:59 ssh -o
> BatchMode=yes -o ControlPath=/opt/data/ssh-socket_A-10.10.14.126 -o
> User=foobar 10.10.14.126 remote_command
> >
> > This instance of ssh client should not be there anymore.
> >
> > Having a deeper look:
> > * kill -9 on that process does not kill it.
> > * The corresponding ControlPath socket does not exist anymore on the
> system,
> >   nor does the ssh master process for this socket.
> > * truss on that process does not show any activity at all: the process is
> >   apparently inside a system call.
> > * kernel activity on the machine as reported by topas is 99%
> > * ls -l /proc/5832832/fd only shows 3 FDs :
> >     # ls -l /proc/5832832/fd
> >     0 total
> >     c---------    1 root     system        2,  2 14 d� 11:22 0
> >     p---------    0 root     system            0 22 nov 06:18 1
> >     c---------    1 root     system        2,  2 14 d� 11:22 2
> >
> > I have currently 6 of those processes running on this system. Some of
> them
> > are running for weeks like the above. Others are running for days.
> >
> > This situation looks like a kernel bug to me. Do you have any idea of
> > anything that might be triggering it in the OpenSSH code in this old
> > version of OpenSSH ?
> >
> >
> > Thanks,
> > Flavien.
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
>



-- 
# include <stddisclaimer.h>
/* Kevin  Brott <Kevin.Brott at gmail.com> */


More information about the openssh-unix-dev mailing list