Hung connection over Juniper Tunnel
Darren Tucker
dtucker at zip.com.au
Sat Feb 7 16:06:22 EST 2009
Damien Miller wrote:
> On Fri, 6 Feb 2009, Jason Benguerel wrote:
>
>> Hello list!
>>
>> So I recently reconfigured our office network to allow a permanent VPN
>> connection to our data center. This consists of a Juniper SSG-520
>> connected via a tunnel to a Juniper Netscreen-25 over a 100M leased
>> NTT VPN (yes I'm tunneling over the VPN as it's the only way to make
>> it routable.) Here is where OpenSSH come in. When I try and ssh to a
>> machine on the other end of the tunnel, I can get past the
>> authentication stage and then it just hangs and times out. Everything
>> else works, ping, http, and dns (ICMP, TCP and UDP in other words.)
>> More cryptically, I can effortlessly ssh with PuTTY from a windows
>> box. It seems that OpenSSH (or the Unix TCP/IP stack) is the only
>> thing affected. Now I'm the first to admit that this is most likely
>> some sort of subtle MTU or low level TCP issue, and I'm guessing the
>> OpenSSH is the canary in the coal mine, it would be great if I can get
>> someone to tell me why it's freezing so that I can fix the actual cause.
>>
>> There were several people complaining of similar issues, typically it
>> turned out to be bad wireless drivers or broken routers, no direct
>> cause was ever indicated.
>
> There are two types of common hang:
>
> 1) Long-lived but SSH connections being timed out of NAT/firewall state
> after some period of quiescence. This can be worked around with the
> ClientAliveInterval and ServerAliveInterval controls in ssh_config and
> sshd_config respectively.
>
> 2) Path MTU blackholes. The hang here usually occurs when either end first
> sends a packet containing a MTU of data or more. The is no SSH-level
> workaround for this, but the tool of choice to diagnose it is
> "ping -D -s xxxx yourhost" where xxxx is the packet size that your want
> to test (start at 1492 and work down).
3) some NAT/firewalls seem to choke when Nagle gets disabled on an
established connection (that's what's happening when the debug output
says "setting TCP_NODELAY" immediately before your connection freezes,
which is what makes me think that's the problem here).
You can test this theory by editing misc.c:set_nodelay() and adding a
"return;" immediately after the variable declarations (this is around
line 138 in recent versions) and recompiling ssh.
--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
More information about the openssh-unix-dev
mailing list