Failure to Launch (was override -q option)

Laurence Marks L-marks at northwestern.edu
Sun Jul 21 09:02:14 EST 2013


Hmmmm. I guess I can create a script to run "strace ssh @". I will
have to do a bit of tweaking of a few scripts since I don't want to
disturb jobs which are currently running happily using openmpi/mpirun
to replace ssh.

Can you suggest what flags etc to use with strace? I would prefer to
get this right (tomorrow), since what I will have to do is start some
(2-4) not so useful jobs then wait 10-36 hrs so they can work through
the job queue and one will hopefully hit the 0.1-0.2% failure rate. (I
assume it won't be useful to attach strace to an existing zombie.)

On Sat, Jul 20, 2013 at 5:43 PM, Damien Miller <djm at mindrot.org> wrote:
> On Sat, 20 Jul 2013, Laurence Marks wrote:
>
>> You are right, not sure how that happened. I will try again, the last
>> lines are five of
>> "/hpc/opt/intel/composer_xe_2013/mkl/bin/mklvars.sh: line 93: manpath:
>> command not found
>> debug2: channel 0: written 88 to efd 13"
>
> I see those lines in the log file, but there is nothing to indicate
> ssh has exited at this point. As far as can be ascertained from the
> logs, it is either still running happily, has crashed with a SEGV
> (which seems unlikely given you've tried two different versions) or
> has been killed with an untrappable signal by its parent process.
>
> If there is no more log output and the ssh processes are ending up dead
> or zombied, then I'd suggest running strace on one of the ssh processes
> to watch which syscalls it is making and any signals it is receiving.
>
> -d
>



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the openssh-unix-dev mailing list