remote vs local window discrepancy

David Wierbowski wierbows at us.ibm.com
Fri Jul 23 10:29:21 EST 2010


Damien,

Thanks for the quick response.  I am on a platform that uses the
output filter function in openbsd-compat/port-tun.c.  I had briefly
looked at the sys_tun_outfilter function and was suspicious when I saw
that it decremented dlen.  I think you might be on to the source of the
discrepancy.

I will add the checks you provided and rebuild.

Your answer about why limit is set to remote_window makes sense.
When I was looking at the check I had the purpose of the names
backwards.

On Thu, 22 Jul 2010, Damien Miller wrote:

>On Thu, 22 Jul 2010, David Wierbowski wrote:
>
>> After some investigation I determined that for every packet sent the
client
>> is decrementing Channel.remote_window by a value that is 4 bytes larger
>> than the amount that the server decrements Channel.local_window and
>> Channel.local_consumed.  Prior to the stall the server does send
>> SSH_MSG_CHANNEL_WINDOW_ADJUST messages.  When it does the "byte to add"
>> value is off by 4x the number of packets consumed by the server.
>> Eventually over time this drives the client's remote window count to go
to
>> zero.  As an aside the remote window count has to be exactly 0 for the
>> stall to occur.
>>
>> Initially the following line of code in channel_output_poll that
decrements
>> the remote window count  for datagram channels looked suspicious:
>>
>> c->remote_window -= dlen + 4;
>
>The extra 4 bytes are there because datagram channels encode packets
>as SSH strings. These have a 4-byte length header.
>
>> However, the code that updates Channel.local_window and
>> Channel.local_consumed for a datagram channel also includes the +4 in
the
>> calculation.  Does anybody know why the datagram calculation includes a
+4?
>> Anybody know what would cause the 4 byte discrepancy I am seeing?
>
>I don't see anything obviously wrong in there. Perhaps the tunnel output
>filter is messing up dlen as it strips the header? Are you on a platform
>that uses the output filter in openbsd-compat/port-tun.c ?
>
>If this turns out to be true, I guess we will have to change the signature
>of the output filter function to return the length of the data that was
>actually dequeued (right now it returns the size of the data to be
written).
>
>> A complicating factor is that in channel_output_poll the calculation to
>> update the remote window in the datagram case does not take into account
>> that dlen may be larger than the remote_window size.  Does anybody know
>> why?  Perhaps there is a check elsewhere that makes this safe, but I am
not
>> seeing it.  During problem determination I have observed the value of
the
>> remote window does occasionally wrap.  When the remote window counter
does
>> wrap it goes undetected because Channel.remote_window is an unsigned
value.
>
>Yes, it looks like there are missing checks there:
>
>Index: channels.c
>===================================================================
>RCS file: /cvs/src/usr.bin/ssh/channels.c,v
>retrieving revision 1.308
>diff -u -p -r1.308 channels.c
>--- channels.c          13 Jul 2010 23:13:16 -0000          1.308
>+++ channels.c          22 Jul 2010 23:20:02 -0000
>@@ -2150,6 +2150,14 @@ channel_output_poll(void)
>
>                                                            data =
buffer_get_string(&c->input,
>                                                                &dlen);
>+                                                           if (dlen > c->
remote_window ||
>+                                                               dlen > c->
remote_maxpacket) {
>+
debug("channel %d: datagram "
>+
"too big for channel",
>+
c->self);
>+
xfree(data);
>+
continue;
>+                                                           }
>                                                            packet_start
(SSH2_MSG_CHANNEL_DATA);
>                                                            packet_put_int
(c->remote_id);
>
packet_put_string(data, dlen);
>
>> Another item I find confusing is the test in channel_pre_open to decide
if
>> the channel's read file descriptor should be turned on in the read
fileset.
>> That test includes a check of a variable called limit which is set to
>> Channel.remote_window when compat20 is true.  Can somebody explain why
this
>> is remote_window instead of local_window?  The check is "limit > 0"
which
>> is why the wrapping of remote_window goes undetected
>
>It is set to remote_window because we need to determine whether to stop
>reading if their window is already full or if we already have enough data
>read but yet to be sent to fill their window.
>
>-d


Dave Wierbowski







More information about the openssh-unix-dev mailing list