ControlMaster and ControlPersist leads to zombie processes

Christoph Anton Mitterer calestyo at scientia.net
Tue Apr 10 01:04:00 EST 2012


Hi.

Perhaps you can help me with this:

What I do is using Nagios (actually Icinga) and having checks on remote 
hosts executed via ssh.
In order to dramatically speed checks up (from about 0,300 ms to 0,010 
ms) I use ControlMaster = auto, which also makes the mux process spawned 
on the first check.
As checks are typically sequentially scheduled I want the mux process 
to persist but it should also go away automatically after some days if 
not re-used (e.g. when I don't check a host anymore).
So I have something like ControlPersist 2d.


Now I stumbled across the following problem (and I'm actually not sure 
whether it's a ssh issue or Icinga):
The first time the check is done (which is when the mux process is 
spawned) it times out.
The mux process keeps running and everything works on subsequent 
checks.

The timeout is one enforced by Icinga (60s), when it thinks the command 
doesn't return.


I made some checks and the following turns out to happen on the FIRST 
connection:
- executing the command on the remote side is actually done
- on the local side, the ssh process (or a wrapper shell script around) 
becomes a zombie as soon as the remote command was executed
- after 60s, when Icinga enforces it's timeout, the zombie goes away
- the (local) mux process continues to run


Any ideas why this could happen? Is there perhaps something that lets 
the parent processes notice that there is still a running child (i.e. 
the mux process)?


Thanks,
Chris.


More information about the openssh-unix-dev mailing list