internal error: we do not read, but chan_read_failed

Thu Nov 9 03:30:15 EST 2000

Hello,

The error message in the subject line occurs with the new 2.3.0 openssh
version and appeared in the previous snapshots on our Solaris systems. As far
as I remember it was reported, but have not seen any more about this. I have
looked into it a little bit.

First, the file session.c (line 1849 onwards):

        debug("session_exit_message: release channel %d", s->chanid);
        channel_cancel_cleanup(s->chanid);
        /*
         * emulate a write failure with 'chan_write_failed', nobody will be
         * interested in data we write.
         * Note that we must not call 'chan_read_failed', since there could
         * be some more data waiting in the pipe.
         * djm - This is no longer true as we have allowed one pass through
         * the select loop before killing the connection
         */
        if (c->ostate != CHAN_OUTPUT_CLOSED)
                chan_write_failed(c);
        if (c->istate != CHAN_INPUT_CLOSED)
                chan_read_failed(c);
        s->chanid = -1;

The output from the server when copying a file from one system to another
using scp gives:

debug1: channel 0: close_write
debug1: Received SIGCHLD.
debug1: tvp!=NULL kid 1 mili 100
debug1: session_by_pid: pid 28343
debug1: session_exit_message: session 0 channel 0 pid 28343
debug1: session_exit_message: release channel 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: tvp!=NULL kid 1 mili 100
debug1: session_by_pid: pid 28343
debug1: session_exit_message: session 0 channel 0 pid 28343
debug1: session_exit_message: release channel 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed                                    
debug1: channel 0: send eof
debug1: session_free: session 0 pid 28343
debug1: channel 0: read<=0 rfd 8 len 0
debug1: channel 0: read failed
error: channel 0: internal error: we do not read, but chan_read_failed for
istate 8
debug1: channel 0: closing efd 10
debug1: channel 0: send close
debug1: channel 0: rcvd close
debug1: channel 0: full closed2
debug1: channel_free: channel 0: status: The following connections are open:
  #0 server-session (t4 r0 i8/0 o128/0 fd 8/8) 

However if I comment out from session.c the call to 'chan_read_failed'
(which I assume was not there previously), then I get:

debug1: channel 0: close_write
debug1: Received SIGCHLD.
debug1: tvp!=NULL kid 1 mili 100
debug1: session_by_pid: pid 10918
debug1: session_exit_message: session 0 channel 0 pid 10918
debug1: session_exit_message: release channel 0
debug1: session_free: session 0 pid 10918
debug1: channel 0: read<=0 rfd 8 len 0
debug1: channel 0: read failed
debug1: channel 0: input open -> drain
debug1: channel 0: close_read
debug1: channel 0: input: no drain shortcut
debug1: channel 0: ibuf empty
debug1: channel 0: input drain -> closed
debug1: channel 0: send eof
debug2: channel 0: read 0 from efd 10
debug1: channel 0: closing efd 10
debug1: channel 0: send close
debug1: channel 0: rcvd close
debug1: channel 0: full closed2
debug1: channel_free: channel 0: status: The following connections are open:
  #0 server-session (t4 r0 i8/0 o128/0 fd 8/8)

SIGCHLD is caught within serverloop.c. This calls 'session_close_by_pid'
which calls 'session_exit_message'. At this point everything seems to be
pretty much the same in the original and the modified version. However, next
in the modified version is the call to 'session_free' as seen in the latter
server output. In the original version 'session_exit_message' calls
'chan_read_failed' because the input channel state is not (yet) closed
(CHAN_INPUT_CLOSED); it is open but gets changed to CHAN_INPUT_WAIT_DRAIN
(it seems). As such it attempts to read more info; gets none and goes
through the 'read failed'; 'input open -> drain'; 'input open -> closed';
'send eof' sequence. *Then* 'session_free' is called (as part of session
close), and we end up back in session.c. Next (in serverloop.c) is called
'channel_after_select' which calls channel_handler. This calls (in our case
we are using protocol 2) 'channel_post_open_2', which calls
'channel_handle_rfd' which (eventually) calls 'chan_read_failed' because the
channel still hasn't been closed. As such in the original openssh we get the
final - 

debug1: channel 0: read<=0 rfd 8 len 0
debug1: channel 0: read failed
error: channel 0: internal error: we do not read, but chan_read_failed for
istate 8

 - messages appearing.

Okay, so I'm not *sure* that this is how it all seems to hang together, but
it looks like it. Basically, the extra 'chan_read_failed' call in session.c
seems to be the culprit because it starts the session closing sequence, but
the serverloop (which it eventually returns to) seems to expect to do that
with the 'channel_after_select'. As things have already been started, it
comes across unexpected conditions and (ultimately) produces the error
message.

If someone who knows what they are doing (not me!) wants to go through this
and either verify or not that this is the cause, then I'd be grateful.

I have commented out the call to 'chan_read_failed' here and have seen no
problems so far.

That's it - phew!

John.

------------------------------------------------------------------------
John Horne, University of Plymouth, UK           Tel: +44 (0)1752 233914
E-mail: jhorne at plymouth.ac.uk
PGP key available from public key servers