suggested fix for the sigchld race

Dan Astoorian djast at cs.toronto.edu
Thu Nov 1 06:23:00 EST 2001


On Wed, 31 Oct 2001 13:33:38 EST, Nicolas Williams writes:
> Does this make it unnecessary to block SIGCHLD around where
> child_terminated is manipulated? At first glance I'd say yes...

It might be cleaner to eliminate child_terminated altogether in favour
of the new mechanism for polling for terminated children.

If child_terminated is still necessary, then the blocking in
collect_children() shouldn't be removed.  Otherwise, a SIGCHLD that
arrives during collect_children() might not set child_terminated
correctly.

In general, I'd like to see SIGCHLD and other signals blocked more of
the time, not less: the arrival of a signal can cause system calls to be
interrupted, among other possible mischief and complication.

(Note that SA_RESTART doesn't work around that problem effectively: that
flag affects some, but not all, system calls.  In Pine under Solaris 8,
I've seen SIGALRM cause the unlink() of lock files over NFS to fail, for
example.)

IMHO, the safest and simplest approach in general is often to keep the
signal blocked at all times, except when we know we're prepared to
handle it; e.g., unblock it before any calls which we expect may block
for a lengthy time--such as select()--and be prepared for those calls to
fail with EINTR.

FWIW, I'm not fond of the debug() call inside sigchld_handler(); it can
cause async-unsafe operations to occur.  (BTW, I'm guessing that we
already know that grace_alarm_handler() in ssh.c does async-unsafe
operations, hence the comment "/* XXX no idea how fix this signal
handler */" :-) )

-- 
Dan Astoorian               People shouldn't think that it's better to have
Sysadmin, CSLab             loved and lost than never loved at all.  It's
djast at cs.toronto.edu        not, it's better to have loved and won.  All
www.cs.toronto.edu/~djast/  the other options really suck.    --Dan Redican



More information about the openssh-unix-dev mailing list