Hung connection over Juniper Tunnel

Jason Benguerel jason at bakafish.com
Tue Feb 10 13:14:18 EST 2009


Sorry to not respond quickly to these suggestions.

This is not a long term timeout, it locks up immediately after  
establishing the password is valid or that the key file is in  
authorized_keys. There is a very high probability that this is MTU  
related as it is an IPsec tunnel over a PPPoE link, so there are  
plenty of things that could go wrong. I set the PPoE MTU to 1492 and  
the tunnel endpoints to 1480 on the basis that the largest unsegmented  
packet I could transmit had a payload of 1472.

I disabled the set_nodelay() on both client and server sides, but it  
didn't have any effect.

I understand I have to solve the underlying MTU problem before I'm  
able to look at the other issues. Because this is a somewhat  
convoluted setup it is proving difficult to figure out. My connection  
looks like this:

Client A (MTU1500) <---> IPsec Tunnel (MTU1480) <---> PPPoE-VPN  
(MTU1492) <---> IPsec Tunnel (MTU1480) <---> Client B (MTU1500)

 From Client A to B I can send up to a 1472 byte packet before it  
chokes. However on the B side it is only able to send a 1420 byte  
packet for reasons that are not at all clear. I therefore changed the  
client B side of the tunnel to MTU1448 to no visible effect.

Again, sorry to sideline the OpenSSH list with a potentially off topic  
networking issue, but only OpenSSH so far is visibly suffering from  
this and understanding why that may be may allow the tool to become  
more robust, or at least flag the exact cause in it's dubug output.


On Feb 7, 2009, at 2:06 PM, Darren Tucker wrote:

> Damien Miller wrote:
>> On Fri, 6 Feb 2009, Jason Benguerel wrote:
>>> Hello list!
>>>
>>> So I recently reconfigured our office network to allow a permanent  
>>> VPN  connection to our data center. This consists of a Juniper  
>>> SSG-520  connected via a tunnel to a Juniper Netscreen-25 over a  
>>> 100M leased  NTT VPN (yes I'm tunneling over the VPN as it's the  
>>> only way to make  it routable.) Here is where OpenSSH come in.  
>>> When I try and ssh to a  machine on the other end of the tunnel, I  
>>> can get past the  authentication stage and then it just hangs and  
>>> times out. Everything  else works, ping, http, and dns (ICMP, TCP  
>>> and UDP in other words.)  More cryptically, I can effortlessly ssh  
>>> with PuTTY from a windows  box. It seems that OpenSSH (or the Unix  
>>> TCP/IP stack) is the only  thing affected. Now I'm the first to  
>>> admit that this is most likely  some sort of subtle MTU or low  
>>> level TCP issue, and I'm guessing the  OpenSSH is the canary in  
>>> the coal mine, it would be great if I can get  someone to tell me  
>>> why it's freezing so that I can fix the actual cause.
>>>
>>> There were several people complaining of similar issues, typically  
>>> it  turned out to be bad wireless drivers or broken routers, no  
>>> direct  cause was ever indicated.
>> There are two types of common hang:
>> 1) Long-lived but SSH connections being timed out of NAT/firewall  
>> state
>>   after some period of quiescence. This can be worked around with the
>>   ClientAliveInterval and ServerAliveInterval controls in  
>> ssh_config and
>>   sshd_config respectively.
>> 2) Path MTU blackholes. The hang here usually occurs when either  
>> end first
>>   sends a packet containing a MTU of data or more. The is no SSH- 
>> level
>>   workaround for this, but the tool of choice to diagnose it is     
>> "ping -D -s xxxx yourhost" where xxxx is the packet size that your  
>> want
>>   to test (start at 1492 and work down).
>
> 3) some NAT/firewalls seem to choke when Nagle gets disabled on an  
> established connection (that's what's happening when the debug  
> output says "setting TCP_NODELAY" immediately before your connection  
> freezes, which is what makes me think that's the problem here).
>
> You can test this theory by editing misc.c:set_nodelay() and adding  
> a "return;" immediately after the variable declarations (this is  
> around line 138 in recent versions) and recompiling ssh.
>
> -- 
> Darren Tucker (dtucker at zip.com.au)
> GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4  37C9 C982 80C7 8FF4 FA69
>    Good judgement comes with experience. Unfortunately, the experience
> usually comes from bad judgement.



More information about the openssh-unix-dev mailing list