remote vs local window discrepancy

David Wierbowski wierbows at us.ibm.com
Wed Jul 28 00:51:07 EST 2010


Damien,

I tried the change below.  This appears to take care of the issue with the
remote_window being decremented by a value that is off by 4.
Unfortunately, now the local_window is growing by a value of 4,  It seems
that local_window is being decremented by 4 bytes less than local_consumed.

Dave Wierbowski




From:       Damien Miller <djm at mindrot.org>
To:         David Wierbowski/Endicott/IBM at IBMUS
Cc:         openssh-unix-dev at mindrot.org
Date:       07/22/2010 07:34 PM
Subject:    Re: remote vs local window discrepancy



On Thu, 22 Jul 2010, David Wierbowski wrote:

> After some investigation I determined that for every packet sent the
client
> is decrementing Channel.remote_window by a value that is 4 bytes larger
> than the amount that the server decrements Channel.local_window and
> Channel.local_consumed.  Prior to the stall the server does send
> SSH_MSG_CHANNEL_WINDOW_ADJUST messages.  When it does the "byte to add"
> value is off by 4x the number of packets consumed by the server.
> Eventually over time this drives the client's remote window count to go
to
> zero.  As an aside the remote window count has to be exactly 0 for the
> stall to occur.
>
> Initially the following line of code in channel_output_poll that
decrements
> the remote window count  for datagram channels looked suspicious:
>
> c->remote_window -= dlen + 4;

The extra 4 bytes are there because datagram channels encode packets
as SSH strings. These have a 4-byte length header.

> However, the code that updates Channel.local_window and
> Channel.local_consumed for a datagram channel also includes the +4 in the
> calculation.  Does anybody know why the datagram calculation includes a
+4?
> Anybody know what would cause the 4 byte discrepancy I am seeing?

I don't see anything obviously wrong in there. Perhaps the tunnel output
filter is messing up dlen as it strips the header? Are you on a platform
that uses the output filter in openbsd-compat/port-tun.c ?

If this turns out to be true, I guess we will have to change the signature
of the output filter function to return the length of the data that was
actually dequeued (right now it returns the size of the data to be
written).

> A complicating factor is that in channel_output_poll the calculation to
> update the remote window in the datagram case does not take into account
> that dlen may be larger than the remote_window size.  Does anybody know
> why?  Perhaps there is a check elsewhere that makes this safe, but I am
not
> seeing it.  During problem determination I have observed the value of the
> remote window does occasionally wrap.  When the remote window counter
does
> wrap it goes undetected because Channel.remote_window is an unsigned
value.

Yes, it looks like there are missing checks there:

Index: channels.c
===================================================================
RCS file: /cvs/src/usr.bin/ssh/channels.c,v
retrieving revision 1.308
diff -u -p -r1.308 channels.c
--- channels.c           13 Jul 2010 23:13:16 -0000          1.308
+++ channels.c           22 Jul 2010 23:20:02 -0000
@@ -2150,6 +2150,14 @@ channel_output_poll(void)

                                                             data =
buffer_get_string(&c->input,
                                                                 &dlen);
+                                                            if (dlen > c->
remote_window ||
+                                                                dlen > c->
remote_maxpacket) {
+
debug("channel %d: datagram "
+
"too big for channel",
+
c->self);
+
xfree(data);
+
continue;
+                                                            }
                                                             packet_start
(SSH2_MSG_CHANNEL_DATA);
                                                             packet_put_int
(c->remote_id);

packet_put_string(data, dlen);

> Another item I find confusing is the test in channel_pre_open to decide
if
> the channel's read file descriptor should be turned on in the read
fileset.
> That test includes a check of a variable called limit which is set to
> Channel.remote_window when compat20 is true.  Can somebody explain why
this
> is remote_window instead of local_window?  The check is "limit > 0" which
> is why the wrapping of remote_window goes undetected

It is set to remote_window because we need to determine whether to stop
reading if their window is already full or if we already have enough data
read but yet to be sent to fill their window.

-d





More information about the openssh-unix-dev mailing list