how to speed up OpenSSH command execution (and a speed analysis)
Christoph Anton Mitterer
calestyo at scientia.net
Mon Mar 26 03:36:54 EST 2012
Hi.
I recently did some investigation about how to get out the last
microseconds of
executing commands via OpenSSH on remote host (of course I'm using
ConnectMaster).
MOTIVATION:
I'm introducing Nagios (well actualla Icinga) at the local institute.
We have
many active checks that must run locally on the remote hosts.
The "best" way to do this is using NRPE (Nagios Remote Plugin
Executor), which
runs a daemon listening on a port, waiting for commands to be executed.
The problem with NRPE is that it's inherently insecure (even when using
the
fake-SSL mode) it provides (as extensively disscussed here [0], [1] and
[2]).
Also executing commands on a remote host is bussiness the "belongs" to
SSH and
NRPE more or less duplicates this.
Another reason why NRPE is broken is, that the mode in which argument
passing
(to the check scripts) is enabled is already marked as being unsecure.
Why have NRPE then?
- It allows only certain commands to be executed
=> With SSH this could however be done, too, I guess, by means as
rssh.
- It's much faster.
=> What I try to "solve" here?
Why not using stunnel + NRPE?
=> This would still allow any local user on the remote host to contact
the
running NRPE daemon, and execute commands. This might be a security
risk,
e.g. if the NRPE has sudo rights or so.
What's the goal?
- Drop NRPE and use SSH instead of it, if the latter can be made as
fast (or
nearly as fast) as NRPE.
- Use rssh to restrict the commands that may be run.
- Use SSH-keys to allow the Nagios node to login to the
(rssh-restricted) remote
host.
USING CONTROLMASTER:
I guess it's inevitable to use ControlMaster for the connections from
the Nagios
host to the remot hosts.
The actual connections for the commands usually close immediately, so a
spawner
is required that keeps up a connection for ALL checked hosts.
I.e. something like:
for each host
ssh -f -N host
Problems here:
- What other options to use (largely for the sake of speed and
security)?
* -o ServerAliveInterval=30 ?
* -C ?
* -a -k -x ?
* others?
- How to spawn that first connection?
I'd prefer that ssh has another mode, e.g. ControlMaster autoswpan,
which
makes about the following:
When the first time a "normal" command is executed, e.g.
ssh example.host.org check_load
it actually does a
ssh -f -N host
and uses that one to do the
ssh example.host.org check_load
That way I wouldn't have to take care on
* spawning the master sessions
* restarting them, when they die for some reason
* they would be only started when really required the frist time
Ideally, there would be a way to timeout those automatically spawned
master
sessions. E.g. when not used for a day, stop it.
ANALYSIS:
I made some tests on the speed of command executiong with NRPE, SSH,
SSH+NRPE,
etc.:
The check_load command was defined as
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
in NRPE.
The sshd_config of the remote host is found below[3].
The nagtest user on the remote host has:
- this /etc/passwd entry:
nagtest:x:54115:100::/home/nagtest:/bin/bash
- a .bashrc and .profile in his homedir
1) NRPE (with it's fake-SSL mode) alone, no SSH or so at all:
# time /usr/lib/nagios/plugins/check_nrpe -H host.example.org -c
check_load
OK - load average: 0.00, 0.02, 0.00|load1=0.000;15.000;30.000;0;
load5=0.020;10.000;25.000;0; load15=0.000;5.000;20.000;0;
real 0m0.047s user 0m0.000s sys 0m0.004s
# time /usr/lib/nagios/plugins/check_nrpe -H host.example.org -c
check_load
OK - load average: 0.00, 0.02, 0.00|load1=0.000;15.000;30.000;0;
load5=0.020;10.000;25.000;0; load15=0.000;5.000;20.000;0;
real 0m0.008s user 0m0.004s sys 0m0.000s
=> The first time it#s quite slow, I guess because of the DNS lookup,
but sub-
sequent invocations are really fast (0.008s)
2) NRPE (withOUT it's fake-SSL mode) alone, no SSH or so at all:
# time /usr/lib/nagios/plugins/check_nrpe -H host.example.org -c
check_load -n
OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0;
load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;
real 0m0.006s user 0m0.004s sys 0m0.000s
=> It's even faster than (1). So given that NRPEs SSL is absolutely
useless in
anyway, one should always just disable it.
3) NRPE (withOUT it's fake-SSL mode) and with tunneling the connection
over SSH
via port-forwarding, NO(!) ControlMaster set:
# ssh nagtest at host.example.org -L 2000:host.example.org:5666 -N
(running everything under the nagtest user, the NRPE daemon listens on
port 5666)
(running check_nrpe on localhost:2000 in order to use the
port-forwarding)
# time /usr/lib/nagios/plugins/check_nrpe -p 2000 -H localhost -c
check_load -n
OK - load average: 0.31, 0.07, 0.02|load1=0.310;15.000;30.000;0;
load5=0.070;10.000;25.000;0; load15=0.020;5.000;20.000;0;
real 0m0.023s user 0m0.004s sys 0m0.000s
real 0m0.010s user 0m0.004s sys 0m0.000s
real 0m0.017s user 0m0.004s sys 0m0.000s
real 0m0.006s user 0m0.004s sys 0m0.000s
=> On the first few invocations, time varied quite a lot (perhaps the
remove
system was under load).
But then it got as fast as NRPE without SSH tunneling!
This is really interesting, as it shows, I guess, that it's not the
encryption layer of SSH that makes things slow
Sidenode:
Why don't I just stop here, and use NRPE tunneled over SSH?
Cause NRPE would still be insecure and could be invoked on the
localhost by
other users
4) From now on, no more NRPE.
Plain SSH, no special options, no ControlMaster, obviously no
port-forwarding:
# time ssh nagtest at host.example.org /usr/lib/nagios/plugins/check_load
-w 15,10,5 -c 30,25,20
OK - load average: 0.01, 0.05, 0.05|load1=0.010;15.000;30.000;0;
load5=0.050;10.000;25.000;0; load15=0.050;5.000;20.000;0;
real 0m0.126s user 0m0.036s sys 0m0.000s
real 0m0.169s user 0m0.036s sys 0m0.000s
=> Once it was "fast" (0.126s), but all other times I've tested it was
around 0.169s.
Control Master setup:
Host *
ControlPath ~/.ssh/master-%l-%r@%h:%p
ControlMaster auto
5) SSH with ControlMaster:
Opening the background control master:
# ssh -f -N nagtest@@host.example.org
# time ssh nagtest at lcg-lrz-dc20.grid.lrz-muenchen.de
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0;
load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;
real 0m0.045s user 0m0.004s sys 0m0.000s
=> Fastest result with SSH so far.
6) SSH with ControlMaster but dash as shell
I thought maybe it's bash that is slow, so I changed the users shell to
"dash".
So I changed this in /etc/passwd.
First I found out that this only takes effect when the Controls Master
is
restarted,... why?
But apart from that, it had no impact on speed.
7) SSH with ControlMaster but ash as shell
I made a test with ash as shell, where I actually got down to the
0.006s.
But I couldn't reproduce this later.
8) SSH with ControlMaster but /bin/true as shell
# time ssh nagtest at lcg-lrz-dc20.grid.lrz-muenchen.de
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
real 0m0.040s user 0m0.004s sys 0m0.000s
=> As only true is executed, and no shell config files are read... it
seems
that the problem is not related to shell start up.
MISCELLANEOUS
Are there any further ways to speed things up?
* I think disabling UseDNS isn't of that much use as it only affects
the
first control master connection, right?
* Any ways, e.g. to speed up choice of the identity file?
Or disabling everything but ssh-keys?
etc. pp.
So the question in the end is, can I somehow speed things even more up?
If you need any further analysis work, just tell me.
Thanks,
Chris.
[0] http://tracker.nagios.org/view.php?id=90
[1] http://tracker.nagios.org/view.php?id=125
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547092
[3] AllowUsers root nagtest
ChallengeResponseAuthentication no
PasswordAuthentication no
RSAAuthentication no
Protocol 2
Ciphers
aes256-cbc,aes192-cbc,aes128-cbc,aes256-ctr,aes192-ctr,aes128-ctr,blowfish-cbc
MACs hmac-sha1,hmac-ripemd160
ClientAliveInterval 30
TCPKeepAlive no
AcceptEnv LANG LC_*
X11Forwarding yes
Subsystem sftp /usr/lib/openssh/sftp-server
=> I really wouldn't want to change the Ciphers to something weaker!
More information about the openssh-unix-dev
mailing list