SSH via redundant login-nodes (with and without control channel multiplexing)

Tue Dec 2 15:44:54 EST 2014

Hi.

I'm recently playing a lot with control channel multiplexing and how it
can be used to improve our local setup (ideally safe and automatically
for all users).
What we have here at the faculty are many nodes (thousands), all of them
which are not directly reachable via SSH but only hopping over a login
node, a setup which brings several advantages.

Of these login nodes we have several (for availability reasons), e.g.:
login-1.example.org
login-2.example.org
and all of them are reachable via a round-robin domain containing all
the A and AAAA RRs of the above nodes:
login.example.org.
Because of the round robin domain name, all these nodes have the same
SSH host key pair.

What I now ideally want is, that SSH automatically picks one of the
login nodes (ideally also in a round robin fashion), and that all this
just works gracefully if one isn't reachable or becomes unresponsive.

I'd give something like this to the user's ssh_config:
---------------------- 
Host login.example.org login-1.example.org login-2.example.org
        ProxyCommand none
        ControlMaster auto
        ControlPersist 1m

Host *.example.org
        ControlPath ~/.ssh/control-mux/%h
#1#   ProxyCommand sh -c "ssh -W %h:%p login-1.example.org  ||  ssh -W %h:%p login-2.example.org"
#2#   ProxyCommand ssh -W %h:%p login.example.org
#3#   ProxyCommand sh -c "ssh -o ConnectTimeout=10s -W %h:%p login-1.example.org || ssh -o ConnectTimeout=10s -W %h:%p login-2.example.org"
#4#   ProxyCommand ssh -o ConnectTimeout=10s -W %h:%p login.example.org
----------------------

So I played around a bit with all that (both with and without control
channel multiplexing) and here are the results, questions and issues
I've encountered:

1) without control channel multiplexing (just strip any Control* options
from above's config)
At first I've used a ProxyCommand sh -c "ssh -W %h:%p
login-1.example.org  ||  ssh -W %h:%p login-2.example.org", which works
more or less fine, if SSH to login-1 doesn't work (for whichever reason,
node down, authentication issue, sshd not running) login-2 will be
tried. Great, but the downside is: one always have to add all the login
nodes to the command, no load balancing due to the strict ordering and
the extra sh that is run.

By chance I found out that it actually also works with the round-robin
domain name (i.e. ProxyCommand ssh -W %h:%p login.example.org), well at
least I've tried it with 2 A RRs (and in my tests I've used -4 to ssh).
I tested via DROPing or REJECTing[1] packets to on or the other login
nodes via iptables.
Apparently, ssh picks the first A RR given by the resolver, and if that
"doesn't work", it tries the next. One can see it in the counters of
iptables, that on some connections, the DROP or REJECT rule was hit
(i.e. ssh tried the "down" node first) and sometimes not (i.e. it
immediately chose the "up" node).
Fine, but:
This behaviour (of trying more than one A/AAAA RR) is nowhere really
documented in OpenSSH (which would be really nice):
- Does it work only for 2 RRs (as in my test)? Does it really try both
all A and AAAA RRs?
- In which cases does it try other address RRs? Only when the node
wasn't reachable (i.e. negative ICMP answer), or also in cases of
timeout, authentication or any other errors?
- Doesn't this somehow contradict the default of ConnectionAttempts=1,
since it actually makes more than just one attempt? I mean what if some
domain name contains 1 million A RRs? Actually it seems that it sends
even two packages *per* address, is this simply needed for the tried
handshake or is this a bug?

Another open question is probably, whether using the round-robin name
can be made working if the login-* nodes do *not* use the same host key
pair.
So what one want's is something like this:
Host login.example.org
       HostKeyAlias login-1.example.org
       HostKeyAlias login-2.example.org
in the sense that either key would be accepted. Does that work or could
it be implemented?

2) with control channel multiplexing
Here things get of course much more tricky.

The first thing one notices is, that the control socket is always
created based on the names of the host. In the case of the round-robin
domain this means, that again only one login node will actually be used,
that one where the socket was opened to, thus all load balancing efforts
are basically destroyed again.
Any ideas to solve that? Perhaps by adding %X symbols which are not the
hostname but the v4 or v6 address that was used to connect? This would
have the other advantage that it then also works for same hosts reached
by different names (CNAMEs and that like).

Apart from that, the different ways above (#1# and #2#) work just as one
would expect... if I REJECT access, then it immediately tries the other
one, if I DROP access it takes ages till TCP times out.

Another question one could ask is: How does all that behave if an
existing socket becomes unresponsive?

The first thing I've noted is, that if I use REJECT to block any further
accesses to the socket server (sshd) the socket/mux process aren't
terminated immediately (even though this should probably the way to
go?). If one uses DROP then it takes whatever time it needs to time out
depending on TCP keep alives and/or ServerAlives*.
Now the mux connections seem to behave just like a normal SSH
connection, with respect to ServerAlives* - i.e. after the timeout, the
mux is killed, and any ssh processes using it as well.
I've disabled TCP keep alives and my ServerAlives are set to allow at
most 2 mins of no reply (which is desired in order not to kill of
hanging connections too early. So basically lowering the timeout is not
an alternative if one wants to give hanging sessions the chance to
recover.

Another thing I've observed during DROP//REJECT of the already existing
mux:
OpenSSH's documentation basically says "if there is a mux path
configured and the sockets exists, we try to use it, if that doesn't
work we connect normally". But what's apparently happen is: as soon as
the socket exists and ssh can connect to the socket it won't fall back
to "normal" even if the socket's connection is already dead.
So what I did was: ssh to the same host using the existing socket (whose
connection is however iptables blocked, either with DROP or REJECT)...
the new ssh connects happily to the socket and after it (or the mux
process) times out... it fails and does *not* connect normally :-(

Now, even though I would want to keep my (probably just hanging muxes
and their sessions) for my long timeout period, I still want any *new*
connections trying the other login nodes first (maybe they work
immediately). If the old one recovers, fine, continue to use that one
for the old connections, use the new one for the new connections.[0]
Of course I cannot solve this with ServerAlives* or TCP keep alives
timeouts... even if it would work technically, then any such new
connection would then have a lower timeout (which I no longer want, once
the connection was established).
I hoped ConnectionTimeout could do the job for me.
So I tried #3# and #4# in the config example above,... but
unfortunately: ConnectionTimeout seems to not apply when an existing
control mux socket is used :-(
Question here basically: Can it be implemented that ConnectionTimeout
also works for sockets - in the sense of time that it needs to open the
socket, talk to his socket server (the mux process) and finally get the
okay answer from the remote sshd that a new sessions is there?
Cause if that would work (and also for the round robin thingy) one would
basically have a way that *completely established* connection retain
their long timeouts (via ServerAlives*), but trying to establish such
connection has the short timeout from ConnectionTimeout - thus, if my
existing socket just hangs for a while on login-1, I get a new one on
login-2 (which may be not haning).

Obviously, a tricky portion of the whole thing is still how to use a
round robin name, with multiple sockets... as described in [0].
Especially while not accidentally opening up any tricky ways to exploit
this in terms of security.

Cheers,
Chris.

[0] Here a problem in my suggestion to use the v4/v6 address as the
socket name becomes clear: As the resolver gives back different names,
both would sooner or later be used which somehow destroys the idea of
muxing... not sure whether there is a easy (and especially secure) way
around this. Maybe ssh could check whether a socket already exists that
matches the name of one of the hostname's addresses,... but this seems
to be security prone (what if DNS changes in the meantime,... then
perhaps ssh tricks itself into using the wrong host). So maybe another
way could be to not use the address, but a hash of the host's host key +
the address family?
[1] DROP / REJECT in the sense of Linux' netfilter, respectively
iptables keywords. DROP just silently discards (i.e. one can only run
into the (possibly long) timeouts of SSH),... REJECT sends some ICMP
packet to the client (i.e. one can time out quite fast).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5313 bytes
Desc: not available
URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20141202/f0cfb096/attachment-0001.bin>