Hung connection over Juniper Tunnel
Jason Benguerel
jason at bakafish.com
Tue Feb 10 13:14:18 EST 2009
Sorry to not respond quickly to these suggestions.
This is not a long term timeout, it locks up immediately after
establishing the password is valid or that the key file is in
authorized_keys. There is a very high probability that this is MTU
related as it is an IPsec tunnel over a PPPoE link, so there are
plenty of things that could go wrong. I set the PPoE MTU to 1492 and
the tunnel endpoints to 1480 on the basis that the largest unsegmented
packet I could transmit had a payload of 1472.
I disabled the set_nodelay() on both client and server sides, but it
didn't have any effect.
I understand I have to solve the underlying MTU problem before I'm
able to look at the other issues. Because this is a somewhat
convoluted setup it is proving difficult to figure out. My connection
looks like this:
Client A (MTU1500) <---> IPsec Tunnel (MTU1480) <---> PPPoE-VPN
(MTU1492) <---> IPsec Tunnel (MTU1480) <---> Client B (MTU1500)
From Client A to B I can send up to a 1472 byte packet before it
chokes. However on the B side it is only able to send a 1420 byte
packet for reasons that are not at all clear. I therefore changed the
client B side of the tunnel to MTU1448 to no visible effect.
Again, sorry to sideline the OpenSSH list with a potentially off topic
networking issue, but only OpenSSH so far is visibly suffering from
this and understanding why that may be may allow the tool to become
more robust, or at least flag the exact cause in it's dubug output.
On Feb 7, 2009, at 2:06 PM, Darren Tucker wrote:
> Damien Miller wrote:
>> On Fri, 6 Feb 2009, Jason Benguerel wrote:
>>> Hello list!
>>>
>>> So I recently reconfigured our office network to allow a permanent
>>> VPN connection to our data center. This consists of a Juniper
>>> SSG-520 connected via a tunnel to a Juniper Netscreen-25 over a
>>> 100M leased NTT VPN (yes I'm tunneling over the VPN as it's the
>>> only way to make it routable.) Here is where OpenSSH come in.
>>> When I try and ssh to a machine on the other end of the tunnel, I
>>> can get past the authentication stage and then it just hangs and
>>> times out. Everything else works, ping, http, and dns (ICMP, TCP
>>> and UDP in other words.) More cryptically, I can effortlessly ssh
>>> with PuTTY from a windows box. It seems that OpenSSH (or the Unix
>>> TCP/IP stack) is the only thing affected. Now I'm the first to
>>> admit that this is most likely some sort of subtle MTU or low
>>> level TCP issue, and I'm guessing the OpenSSH is the canary in
>>> the coal mine, it would be great if I can get someone to tell me
>>> why it's freezing so that I can fix the actual cause.
>>>
>>> There were several people complaining of similar issues, typically
>>> it turned out to be bad wireless drivers or broken routers, no
>>> direct cause was ever indicated.
>> There are two types of common hang:
>> 1) Long-lived but SSH connections being timed out of NAT/firewall
>> state
>> after some period of quiescence. This can be worked around with the
>> ClientAliveInterval and ServerAliveInterval controls in
>> ssh_config and
>> sshd_config respectively.
>> 2) Path MTU blackholes. The hang here usually occurs when either
>> end first
>> sends a packet containing a MTU of data or more. The is no SSH-
>> level
>> workaround for this, but the tool of choice to diagnose it is
>> "ping -D -s xxxx yourhost" where xxxx is the packet size that your
>> want
>> to test (start at 1492 and work down).
>
> 3) some NAT/firewalls seem to choke when Nagle gets disabled on an
> established connection (that's what's happening when the debug
> output says "setting TCP_NODELAY" immediately before your connection
> freezes, which is what makes me think that's the problem here).
>
> You can test this theory by editing misc.c:set_nodelay() and adding
> a "return;" immediately after the variable declarations (this is
> around line 138 in recent versions) and recompiling ssh.
>
> --
> Darren Tucker (dtucker at zip.com.au)
> GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
> Good judgement comes with experience. Unfortunately, the experience
> usually comes from bad judgement.
More information about the openssh-unix-dev
mailing list