Jochen Bern Jochen.Bern at binect.de
Wed Nov 10 03:35:48 AEDT 2021

This has got to be one of the weirdest problem descriptions I've ever 
dared publish ...

Yesterday evening, I had problems SSHing from a jump host through an 
IPsec VPN to a couple customer servers (everything running CentOS 7). I 
was able to work around the problem by fiddling with the crypto 
settings; some more details below.

This morning, those connections were back to normal, but the supporter 
on duty reported that he could not SSH into an entirely different server 
(also CentOS 7, and straight from his workplace machine); that problem 
fixed itself a couple hours later, too.

Is this just the spookiest coincidence since last Halloween, or did we 
chance onto a rare, time-triggered malfunction somewhere in the 
OpenSSH(/OpenSSL?) crypto ... ?


Alas, the supporter isn't up to SSH connection debugging, so he never 
did a -vv and couldn't tell any symptom beyond "it times out". I failed 
to save my -vv's output, but I remember that roughly where you'd 
normally get to the KEXINIT, my client claimed to be waiting for some 
ECDH - and then just sat until the timeout.

I usually have two keypairs - one ed25519, one RSA - loaded into my 
agent, and now that things are back to normal, the Kex chosen is 
curve25519-sha256 at libssh.org. In order to circumvent the problem, I had 
to remove my RSA keypair from the agent and use

> $ ssh -o "KexAlgorithms diffie-hellman-group-exchange-sha256" $SERVER

to get logged in.

I started haveged on "my" target machines, but 
/proc/sys/kernel/random/entropy_avail reported > 3kbit anyway and my 
colleague's remote system had haveged running already, so I doubt that 
that actually did anything.

Our monitoring and automated data fetchers apparently never saw any 
problem to SSH into those servers - using RSA keypairs. The server, set 
to LogLevel VERBOSE and typically logging

> Connection from $CLIENT_IP ...
> Postponed publickey for $LOCAL_USER ...

at the beginning of a connection, never wrote the second line for the 
failed attempts. (With all our accesses getting SNATed, I'm not sure yet 
whether there are any dangling instances of the *first* line.)

Nothing in hosts.allow/hosts.deny, and DNS lookups of the client IP 
garner an NXDOMAIN normally.

Thanks for any pointers,
Jochen Bern

Binect GmbH
