how to speed up OpenSSH command execution (and a speed analysis)

Christoph Anton Mitterer calestyo at scientia.net
Mon Mar 26 03:36:54 EST 2012


Hi.

I recently did some investigation about how to get out the last 
microseconds of
executing commands via OpenSSH on remote host (of course I'm using 
ConnectMaster).


MOTIVATION:
I'm introducing Nagios (well actualla Icinga) at the local institute. 
We have
many active checks that must run locally on the remote hosts.
The "best" way to do this is using NRPE (Nagios Remote Plugin 
Executor), which
runs a daemon listening on a port, waiting for commands to be executed.

The problem with NRPE is that it's inherently insecure (even when using 
the
fake-SSL mode) it provides (as extensively disscussed here [0], [1] and 
[2]).
Also executing commands on a remote host is bussiness the "belongs" to 
SSH and
NRPE more or less duplicates this.
Another reason why NRPE is broken is, that the mode in which argument 
passing
(to the check scripts) is enabled is already marked as being unsecure.

Why have NRPE then?
- It allows only certain commands to be executed
   => With SSH this could however be done, too, I guess, by means as 
rssh.
- It's much faster.
   => What I try to "solve" here?

Why not using stunnel + NRPE?
=> This would still allow any local user on the remote host to contact 
the
    running NRPE daemon, and execute commands. This might be a security 
risk,
    e.g. if the NRPE has sudo rights or so.

What's the goal?
- Drop NRPE and use SSH instead of it, if the latter can be made as 
fast (or
   nearly as fast) as NRPE.
- Use rssh to restrict the commands that may be run.
- Use SSH-keys to allow the Nagios node to login to the 
(rssh-restricted) remote
   host.




USING CONTROLMASTER:
I guess it's inevitable to use ControlMaster for the connections from 
the Nagios
host to the remot hosts.

The actual connections for the commands usually close immediately, so a 
spawner
is required that keeps up a connection for ALL checked hosts.
I.e. something like:
for each host
	ssh -f -N host

Problems here:
- What other options to use (largely for the sake of speed and 
security)?
   * -o ServerAliveInterval=30 ?
   * -C ?
   * -a -k -x ?
   * others?
- How to spawn that first connection?
   I'd prefer that ssh has another mode, e.g. ControlMaster autoswpan, 
which
   makes about the following:
   When the first time a "normal" command is executed, e.g.
     ssh example.host.org check_load
   it actually does a
     ssh -f -N host
   and uses that one to do the
     ssh example.host.org check_load
   That way I wouldn't have to take care on
   * spawning the master sessions
   * restarting them, when they die for some reason
   * they would be only started when really required the frist time

   Ideally, there would be a way to timeout those automatically spawned 
master
   sessions. E.g. when not used for a day, stop it.




ANALYSIS:
I made some tests on the speed of command executiong with NRPE, SSH, 
SSH+NRPE,
etc.:
The check_load command was defined as
   /usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
in NRPE.
The sshd_config of the remote host is found below[3].
The nagtest user on the remote host has:
- this /etc/passwd entry:
   nagtest:x:54115:100::/home/nagtest:/bin/bash
- a .bashrc and .profile in his homedir


1) NRPE (with it's fake-SSL mode) alone, no SSH or so at all:
# time /usr/lib/nagios/plugins/check_nrpe  -H host.example.org -c 
check_load
OK - load average: 0.00, 0.02, 0.00|load1=0.000;15.000;30.000;0; 
load5=0.020;10.000;25.000;0; load15=0.000;5.000;20.000;0;
real	0m0.047s	user	0m0.000s	sys	0m0.004s

# time /usr/lib/nagios/plugins/check_nrpe  -H host.example.org -c 
check_load
OK - load average: 0.00, 0.02, 0.00|load1=0.000;15.000;30.000;0; 
load5=0.020;10.000;25.000;0; load15=0.000;5.000;20.000;0;
real	0m0.008s	user	0m0.004s	sys	0m0.000s

=> The first time it#s quite slow, I guess because of the DNS lookup, 
but sub-
    sequent invocations are really fast (0.008s)


2) NRPE (withOUT it's fake-SSL mode) alone, no SSH or so at all:
# time /usr/lib/nagios/plugins/check_nrpe  -H host.example.org -c 
check_load -n
OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; 
load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;
real	0m0.006s	user	0m0.004s	sys	0m0.000s

=> It's even faster than (1). So given that NRPEs SSL is absolutely 
useless in
    anyway, one should always just disable it.


3) NRPE (withOUT it's fake-SSL mode) and with tunneling the connection 
over SSH
    via port-forwarding, NO(!) ControlMaster set:
# ssh  nagtest at host.example.org -L 2000:host.example.org:5666 -N

(running everything under the nagtest user, the NRPE daemon listens on 
port 5666)

(running check_nrpe on localhost:2000 in order to use the 
port-forwarding)
# time /usr/lib/nagios/plugins/check_nrpe -p 2000 -H localhost -c 
check_load -n
OK - load average: 0.31, 0.07, 0.02|load1=0.310;15.000;30.000;0; 
load5=0.070;10.000;25.000;0; load15=0.020;5.000;20.000;0;
real	0m0.023s	user	0m0.004s	sys	0m0.000s
real	0m0.010s	user	0m0.004s	sys	0m0.000s
real	0m0.017s	user	0m0.004s	sys	0m0.000s
real	0m0.006s	user	0m0.004s	sys	0m0.000s

=> On the first few invocations, time varied quite a lot (perhaps the 
remove
    system was under load).
    But then it got as fast as NRPE without SSH tunneling!
    This is really interesting, as it shows, I guess, that it's not the
    encryption layer of SSH that makes things slow


Sidenode:
Why don't I just stop here, and use NRPE tunneled over SSH?
Cause NRPE would still be insecure and could be invoked on the 
localhost by
other users


4) From now on, no more NRPE.
    Plain SSH, no special options, no ControlMaster, obviously no 
port-forwarding:
# time ssh nagtest at host.example.org /usr/lib/nagios/plugins/check_load 
-w 15,10,5 -c 30,25,20
OK - load average: 0.01, 0.05, 0.05|load1=0.010;15.000;30.000;0; 
load5=0.050;10.000;25.000;0; load15=0.050;5.000;20.000;0;
real	0m0.126s	user	0m0.036s	sys	0m0.000s
real	0m0.169s	user	0m0.036s	sys	0m0.000s

=> Once it was "fast" (0.126s), but all other times I've tested it was
    around 0.169s.


Control Master setup:
Host *
   ControlPath ~/.ssh/master-%l-%r@%h:%p
   ControlMaster auto


5) SSH with ControlMaster:
Opening the background control master:
# ssh -f -N nagtest@@host.example.org

# time ssh nagtest at lcg-lrz-dc20.grid.lrz-muenchen.de 
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; 
load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;
real	0m0.045s	user	0m0.004s	sys	0m0.000s

=> Fastest result with SSH so far.


6) SSH with ControlMaster but dash as shell
I thought maybe it's bash that is slow, so I changed the users shell to 
"dash".
So I changed this in /etc/passwd.
First I found out that this only takes effect when the Controls Master 
is
restarted,... why?
But apart from that, it had no impact on speed.


7) SSH with ControlMaster but ash as shell
I made a test with ash as shell, where I actually got down to the 
0.006s.
But I couldn't reproduce this later.


8) SSH with ControlMaster but /bin/true as shell
# time ssh nagtest at lcg-lrz-dc20.grid.lrz-muenchen.de  
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

real	0m0.040s	user	0m0.004s	sys	0m0.000s

=> As only true is executed, and no shell config files are read... it 
seems
    that the problem is not related to shell start up.


MISCELLANEOUS
Are there any further ways to speed things up?
* I think disabling UseDNS isn't of that much use as it only affects 
the
   first control master connection, right?
* Any ways, e.g. to speed up choice of the identity file?
   Or disabling everything but ssh-keys?
   etc. pp.




So the question in the end is, can I somehow speed things even more up?
If you need any further analysis work, just tell me.


Thanks,
Chris.



[0] http://tracker.nagios.org/view.php?id=90
[1] http://tracker.nagios.org/view.php?id=125
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547092
[3] AllowUsers root nagtest
     ChallengeResponseAuthentication no
     PasswordAuthentication no
     RSAAuthentication no
     Protocol 2
     Ciphers 
aes256-cbc,aes192-cbc,aes128-cbc,aes256-ctr,aes192-ctr,aes128-ctr,blowfish-cbc
     MACs hmac-sha1,hmac-ripemd160
     ClientAliveInterval 30
     TCPKeepAlive no
     AcceptEnv LANG LC_*
     X11Forwarding yes
     Subsystem sftp /usr/lib/openssh/sftp-server
=> I really wouldn't want to change the Ciphers to something weaker!


More information about the openssh-unix-dev mailing list