SCO OS 5.0.5, issues was Re: Solved: on Solaris, "couldn't wait for child '...' completion: Nochild processes"

Aran Cox acox at cv.telegroup.com
Sat May 20 00:41:13 EST 2000


I am seeing these same errors when using the built-in RNG.  I raised the
delay as suggested and it didn't change anything on my system.  I am
trying to get 2.1.0 to function on SCO OS 5.0.5 using the SCO
development environment.

Before I get into my troubles with the couldn't wait for child errors
I'll lay out what I did to get ssh-2.1.0 to run on SCO OS:

Had to define MAXPATHLEN in defines.h.
I defined it as 1024.  

I couldn't figure out where this is defined in SCO OS, but I think I
found MAXPATHLEN to be defined in /udk/usr/include/sys/param.h
as 1024, so I added it to define.h by hand.

If HAVE_DEV_PTMX is defined, code in pty.c (function pty_alloc) is
used that seems to be designed for Solaris 2.X.  The header above the
code is 
/*
 * This code is used e.g. on Solaris 2.x.  (Note that Solaris 2.3
 * also has bsd-style ptys, but they simply do not work.)
 */

It tries to use device names like /dev/pts000 and the code in
pty_make_controlling_tty to fail.  Specifically this code fails:

/* Verify that we now have a controlling tty. */
fd = open("/dev/tty", O_WRONLY);
if (fd < 0)
   error("open /dev/tty failed - could not set controlling tty: %.100s",
   strerror(errno));
else {
   close(fd);
}

Causing this message to be generated by the sshd when run with the -d
option:

error: open /dev/tty failed - could not set controlling tty: No such
device or address

This doesn't stop openssh from functioning, but I can't issue the resize
command and that's a problem.

If I alter the config.h line that defines HAVE_DEV_PTMX to:
#undef HAVE_DEV_PTMX
then it compiles with code that seems to work exactly as expected,
choosing tty device names like /dev/ttyp8.

I don't know what to think about the /dev/pts* problem.  Is it possible
that /dev/pts* aren't tty's?  Or are SCO OpenServers /dev/pts* devices
broken just as the comment states that Solaris 2.3's are?  Or is it
simply that there is another method for releasing/setting controlling
tty's under SCO?

The Couldn't wait for child error messages is generated after a failed
call to waitpid.  Now, in the initial sshd process the commands issued
to gather entropy exit and become zombies.  As a consequence the waitpid
call returns as expected.  In the forked sshd processes spawned to
handle an incoming connection, the processes do not become zombies, they
just exit causing the subsequent call to waitpid to fail. 

At least, that's been my experience under SCO OS 5.0.5  This behaviour
is also visible under linux if you use the built-in entropy generation
code.  Now, SCO OS 5.0.5 also fills the log with another error message
which linux doesn't show.

The sshd child (again not the master daemon, just the daemons spawned to
handle connections) generates these error messages:
May 19 09:32:15 ohare sshd[16872]: error: Command '/bin/df -i': select()
failed: Interrupted system call

These errors immediately precede the couldn't wait for child messages. 
And I assume they are being caused by the same thing.  

I realize I didn't supply any patches to fix the first two issues
(MAXPATHLEN, PTY stuff), I'm a bit unfamiliar with autoconf just yet and
I only have access to SCO machines while I'm at work (where I have a
long list of things the boss actually wants me to be working on.) 
However, I will be looking in to what is up with the failed waitpid
calls (under linux) and can hopefully figure it out this weekend.  



Bladt Norbert wrote:
> 
> > John Horne [SMTP:J.Horne at plymouth.ac.uk] wrote:
> >
> > Emanuel Borsboom <emanuel at heatdeath.org> wrote:
> >> Trying to install the portable OpenSSH on Solaris 2.6.  Compiling from
> >> openssh-2.1.0.tar.gz using gcc.  Compiles and installs fine.  sshd
> >> starts fine.  First connection from another system works.  Child sshd is
> >> forked, but the parent dies and logs:
> >>
> >> May 16 11:40:56 qtrade-dev sshd[6510]: error: Couldn't wait for child
> >> '/usr=/bin/ls -alni' completion: No child processes
> >> May 16 11:40:56 qtrade-dev last message repeated 3 times
> >> May 16 11:40:56 qtrade-dev sshd[6510]: error: -1 Command '/usr/bin/ls
> >> -alni=': select() failed: Interrupted system call
> >> May 16 11:40:56 qtrade-dev sshd[6510]: error: Couldn't wait for child
> >> '/usr=/bin/ls -alni' completion: No child processes
> >>
> >[rest snipped]
> 
> > I too get this on a Sun Ultra 10, Solaris 8 using SSL 0.9.5a; SSH 2.1.0
> and
> > gcc version 2.95.2. I'll take a look, but don't expect anything since I'm
> > not really a C programmer! (sorry)
> Me too on Solaris 7.
> However, I am a C programmer and I was able to fix it.
> The timeout ("interrupted system call" message above)
> occurs because the timeout for the entropy commands is
> to small (100 msec).
> I raised it to 2000 msec (500 msec was too small, too)
> and now it works without these error messages.
> The messages "No child process" is a consequence of the
> interrupted system call message.
> 
> The location to fix is in config.h:
> 
> /* Builtin PRNG command timeout */
> #define ENTROPY_TIMEOUT_MSEC 100
> 
> I changed the original 100 to 2000, did a "make sshd" and that's it.
> 
> Hope this helps,
> 
> Norbert.
> 
> P.S. The real fix for the next release would be to either
> ask for the timeout value, determine it automagically in
> some way or change the hard-coded value of 100 in the "configure"
> script to something more reasonable.
> 
> --
> Norbert Bladt
> ATAG debis Informatik, TZ1 - Z364
> Industriestrasse 1, CH 3052-Zollikofen
> E-Mail: norbert.bladt at adi.ch Tel.: +41 31 915 3964 Fax: +41 31 915 3640

-- 
Aran Cox
Engineering
Telegroup Coralville - Coral Center





More information about the openssh-unix-dev mailing list