remote vs local window discrepancy

Damien Miller djm at mindrot.org
Fri Jul 23 09:33:22 EST 2010


On Thu, 22 Jul 2010, David Wierbowski wrote:

> After some investigation I determined that for every packet sent the client
> is decrementing Channel.remote_window by a value that is 4 bytes larger
> than the amount that the server decrements Channel.local_window and
> Channel.local_consumed.  Prior to the stall the server does send
> SSH_MSG_CHANNEL_WINDOW_ADJUST messages.  When it does the "byte to add"
> value is off by 4x the number of packets consumed by the server.
> Eventually over time this drives the client's remote window count to go to
> zero.  As an aside the remote window count has to be exactly 0 for the
> stall to occur.
> 
> Initially the following line of code in channel_output_poll that decrements
> the remote window count  for datagram channels looked suspicious:
> 
> c->remote_window -= dlen + 4;

The extra 4 bytes are there because datagram channels encode packets
as SSH strings. These have a 4-byte length header.

> However, the code that updates Channel.local_window and
> Channel.local_consumed for a datagram channel also includes the +4 in the
> calculation.  Does anybody know why the datagram calculation includes a +4?
> Anybody know what would cause the 4 byte discrepancy I am seeing?

I don't see anything obviously wrong in there. Perhaps the tunnel output
filter is messing up dlen as it strips the header? Are you on a platform
that uses the output filter in openbsd-compat/port-tun.c ?

If this turns out to be true, I guess we will have to change the signature
of the output filter function to return the length of the data that was
actually dequeued (right now it returns the size of the data to be written).

> A complicating factor is that in channel_output_poll the calculation to
> update the remote window in the datagram case does not take into account
> that dlen may be larger than the remote_window size.  Does anybody know
> why?  Perhaps there is a check elsewhere that makes this safe, but I am not
> seeing it.  During problem determination I have observed the value of the
> remote window does occasionally wrap.  When the remote window counter does
> wrap it goes undetected because Channel.remote_window is an unsigned value.

Yes, it looks like there are missing checks there:

Index: channels.c
===================================================================
RCS file: /cvs/src/usr.bin/ssh/channels.c,v
retrieving revision 1.308
diff -u -p -r1.308 channels.c
--- channels.c	13 Jul 2010 23:13:16 -0000	1.308
+++ channels.c	22 Jul 2010 23:20:02 -0000
@@ -2150,6 +2150,14 @@ channel_output_poll(void)
 
 					data = buffer_get_string(&c->input,
 					    &dlen);
+					if (dlen > c->remote_window ||
+					    dlen > c->remote_maxpacket) {
+						debug("channel %d: datagram "
+						    "too big for channel",
+						    c->self);
+						xfree(data);
+						continue;
+					}
 					packet_start(SSH2_MSG_CHANNEL_DATA);
 					packet_put_int(c->remote_id);
 					packet_put_string(data, dlen);

> Another item I find confusing is the test in channel_pre_open to decide if
> the channel's read file descriptor should be turned on in the read fileset.
> That test includes a check of a variable called limit which is set to
> Channel.remote_window when compat20 is true.  Can somebody explain why this
> is remote_window instead of local_window?  The check is "limit > 0" which
> is why the wrapping of remote_window goes undetected

It is set to remote_window because we need to determine whether to stop
reading if their window is already full or if we already have enough data
read but yet to be sent to fill their window.

-d



More information about the openssh-unix-dev mailing list