Temporary Crypto Glitches ... ??

Thu Nov 11 22:49:55 AEDT 2021

Hi Jochen

We run a few thousands of hosts with varying quality of internet lines.
It is a fallback procedure to try to only use ed25519 crypto if the 
connection fails half-way through. The reason is that it needs only 
smaller packets which can help if there there is (more) trouble with 
bigger network packets.

Cheers

Konrad

On 09.11.21 17:35, Jochen Bern wrote:
> This has got to be one of the weirdest problem descriptions I've ever 
> dared publish ...
> 
> Yesterday evening, I had problems SSHing from a jump host through an 
> IPsec VPN to a couple customer servers (everything running CentOS 7). I 
> was able to work around the problem by fiddling with the crypto 
> settings; some more details below.
> 
> This morning, those connections were back to normal, but the supporter 
> on duty reported that he could not SSH into an entirely different server 
> (also CentOS 7, and straight from his workplace machine); that problem 
> fixed itself a couple hours later, too.
> 
> Is this just the spookiest coincidence since last Halloween, or did we 
> chance onto a rare, time-triggered malfunction somewhere in the 
> OpenSSH(/OpenSSL?) crypto ... ?
> 
> -------
> 
> Alas, the supporter isn't up to SSH connection debugging, so he never 
> did a -vv and couldn't tell any symptom beyond "it times out". I failed 
> to save my -vv's output, but I remember that roughly where you'd 
> normally get to the KEXINIT, my client claimed to be waiting for some 
> ECDH - and then just sat until the timeout.
> 
> I usually have two keypairs - one ed25519, one RSA - loaded into my 
> agent, and now that things are back to normal, the Kex chosen is 
> curve25519-sha256 at libssh.org. In order to circumvent the problem, I had 
> to remove my RSA keypair from the agent and use
> 
>> $ ssh -o "KexAlgorithms diffie-hellman-group-exchange-sha256" $SERVER
> 
> to get logged in.
> 
> I started haveged on "my" target machines, but 
> /proc/sys/kernel/random/entropy_avail reported > 3kbit anyway and my 
> colleague's remote system had haveged running already, so I doubt that 
> that actually did anything.
> 
> Our monitoring and automated data fetchers apparently never saw any 
> problem to SSH into those servers - using RSA keypairs. The server, set 
> to LogLevel VERBOSE and typically logging
> 
>> Connection from $CLIENT_IP ...
>> Postponed publickey for $LOCAL_USER ...
> 
> at the beginning of a connection, never wrote the second line for the 
> failed attempts. (With all our accesses getting SNATed, I'm not sure yet 
> whether there are any dangling instances of the *first* line.)
> 
> Nothing in hosts.allow/hosts.deny, and DNS lookups of the client IP 
> garner an NXDOMAIN normally.
> 
> Thanks for any pointers,
> 
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev?mc_phishing_protection_id=45427-c65ab1euab2puk9vp28g
>