2.9p?: connection hangs with agent forwarding

Lutz Jaenicke Lutz.Jaenicke at aet.TU-Cottbus.DE
Wed Jul 4 23:20:55 EST 2001


On Tue, Jul 03, 2001 at 05:46:09PM +0200, Lutz Jaenicke wrote:
> when using agent forwarding, the connection hangs on exit, if the agent has
> been accessed.
...

Following up to myself. I have spent some more hours^H^H^H^H^Htime with
debugger and source and think that I finally nailed down the reason for
this problem.

> Symptoms:
> - On the server side, the following steps are logged:
> debug1: channel 3: new [accepted auth socket]
> debug1: channel 3: open confirm rwindow 4096 rmax 32768
> debug1: channel 3: read<=0 rfd 11 len 0
> debug1: channel 3: read failed
> debug1: channel 3: input open -> drain
> debug1: channel 3: close_read
> debug1: channel 3: ibuf empty
> debug1: channel 3: input drain -> closed
> debug1: channel 3: send eof

At this point, the side accessing the agent remotely has finished its
access and closes the connection. The sshd now closes the channel for
the read direction: the remote process has closed the connection and
hence the "select on read from remote process" was triggered and failed.
Therefore the read connection is being closed.
["*" Please bear in mind, that the process accessing the agent remotely
has completely closed the connection, so "select for write" would also
fail, but as no data is available to be written, this is not noted.]

> - On the client side, when the agent is accessed, the following output
>   is being logged:
> debug1: channel 1: new [authentication agent connection]
> debug1: confirm auth-agent at openssh.com
> debug1: channel 1: rcvd eof
> debug1: channel 1: output open -> drain
> debug1: channel 1: obuf empty
> debug1: channel 1: output drain -> closed
> debug1: channel 1: close_write

The ssh process started with "-A" now receives the "close read channel" from
the sshd process (its seeing it as writing side). When the close_write
is logged, the ssh process uses "shutdown(sock, WR)" to shut down the
connection to the actual agent process.
At this point the problem appears: the agent process is in select() for read
and the "shutdown(sock, WR)" does not trigger select(). Therefore ssh-agent
never notes, that the connection was actually closed.
[Coming back to "*": at this point normally ssh-agent would close() the
connection to ssh, ssh would therefore also initiate the closing of the other
direction of the channel and the channel would be completely closed. As
this does not happen, the channel stays open for one direction.]
{As ssh-agent does not note the close, the slot for the connection is never
freed and ssh-agent will run out of resources. This has been reported quite
some time ago but at that time I did not see the relation to the "hanging"
agent connection problem.}

> The platform is HP-UX 10.20 (hanging on "sleep 20 &" test, maybe this is
> related!?).
> This is true with 2.9p1 (and older versions, if memory serves me right)
> up to the latest portable-CVS.
> I can fire up the debugger to help track it down, but by digging through
> the source I didn't find, who should close the channel (server or client).
> Shooting into the dark: HP-UX 10.20 needs USE_PIPES and must call close(),
> as shutdown() in just one direction does seem to work as on other
> platforms (see serverloop.c).

This shutdown() problem is the precise reason, why USE_PIPES is already
needed for other connections on HP-UX.
It may therefore be possible to reproduce this problem on other
platforms requiring USE_PIPES (cygwin, hpux, NeXT, SunOS4, SNI, SYSV*, SCO*).

I am not sure on what solution to recommend. I have not yet checked out, whether
select() would trigger on the errorfd (recent reports for prngd have indicated,
that errorfd (in some documentations also referred to as exception_fd)
behaves different between platforms). As its use is not prepared in
e.g. clientloop.c and quite some changes would be needed, I have not
yet touched it.

On the other hand, if sshd would react on the complete close of its local
agent connection and would itself initiate the bi-directional shutdown
(the accessing process actually has closed both sides), the problem would
also not appear.

So much for now,
	Lutz
-- 
Lutz Jaenicke                             Lutz.Jaenicke at aet.TU-Cottbus.DE
BTU Cottbus               http://www.aet.TU-Cottbus.DE/personen/jaenicke/
Lehrstuhl Allgemeine Elektrotechnik                  Tel. +49 355 69-4129
Universitaetsplatz 3-4, D-03044 Cottbus              Fax. +49 355 69-4153



More information about the openssh-unix-dev mailing list