[Bug 1633] New: Race condition in ssh-agent AUTH_CONNECTION

Wed Aug 19 06:27:27 EST 2009

https://bugzilla.mindrot.org/show_bug.cgi?id=1633

           Summary: Race condition in ssh-agent AUTH_CONNECTION
           Product: Portable OpenSSH
           Version: 5.2p1
          Platform: ix86
        OS/Version: Linux
            Status: NEW
          Keywords: patch
          Severity: normal
          Priority: P2
         Component: ssh-agent
        AssignedTo: unassigned-bugs at mindrot.org
        ReportedBy: noodle10000 at googlemail.com
                CC: djm at mindrot.org, ohannet at allez-oop.net
        Depends on: 1254

--- Comment #0 from noodle10000 at googlemail.com 2009-08-19 06:27:26 EST ---
I have the same issue as encountered in bug 1254.  When launching
thousands of SSH connections via a script (the open source
taktuk/kanif) using ssh-agent to forward keys, occasionally I will see
ssh-agent hang and consume 100% of one CPU.  This does not happen every
time, but around 1 out of every 3 runs.

I have compiled 5.2p1 which also exhibits the same issue.   strace at
the time of the hang reports an EAGAIN error on a read call.  A few
printfs isolated the code in question to be the same as mentioned in
bug 1254, but the suggested workaround (add a usleep before trying the
read again) does not work in any case.

This issue is also reported at
http://www.plug.org/pipermail/plug/2009-April/033800.html

+++ This bug was initially created as a clone of Bug #1254 +++

In function after_select(), case AUTH_CONNECTION, the do-loop which
handles socket reads will peg my CPU at close to 100% when errno is
EAGAIN.

I'm running FreeBSD 6.2 pre-release, with OpenSSH built from the ports
collection (security/openssh-portable).

The problem only occurs for me while running an automation script that
sends commands through ssh to about a hundred servers at at time, and I
have not been successful in identifying which server causes the
problem.  But the bottom line is that the read fails with errno EAGAIN,
and continues to fail in a very tight loop until a timeout occurs at
some point.

My work-around was to introduce a tiny sleep before the continue
statement in that loop, which is apparently enough to allow some data
to become available for reading, and makes the problem go away.

I will attach my work-around as a patch, realizing that usleep() is
probably not available on all platforms.

-- 
Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.