Failure to Launch (was override -q option)

Laurence Marks L-marks at northwestern.edu
Mon Jul 22 23:21:12 EST 2013


N.B., I can confirm that this is certainly ssh related somehow. My
hack of replacing the ssh connection with openmpi/mpirun is stable,
and ~15 jobs have now run for 24-36 hrs without any problems. Of
course this does not mean that it is a openssh problem, but links it
clearly in my mind to something associated with it that openmpi/mpirun
avoids.

On Mon, Jul 22, 2013 at 8:11 AM, Laurence Marks
<L-marks at northwestern.edu> wrote:
> Murphy's law. I ran 345 repeats of a shorter mpi code and did not find
> any issue. I am trying a loop of two different mpi codes, and to date
> nothing.
>
> It may be that I will need to run a number of similar jobs in parallel
> which is tricky to setup reliably with a queueing system. One
> question: is there any conceivable way if 10-20 tasks are all trying
> to connect via ssh at the same time that there can be an issue? They
> would all be accessing the same $HOME/.ssh directory, but different
> syslog files. (In case it matters, the compute nodes are diskless.)
>
> On Sun, Jul 21, 2013 at 8:45 AM, Laurence Marks
> <L-marks at northwestern.edu> wrote:
>> Thanks. After a bit of tweaking (including finding where strace was
>> hidden on the compute nodes) I am running 2000 repeats of the shortest
>> of the three mpi tasks. Hopefully it will hang....
>>
>> On Sun, Jul 21, 2013 at 2:52 AM, Darren Tucker <dtucker at zip.com.au> wrote:
>>> On Sun, Jul 21, 2013 at 5:48 PM, Darren Tucker <dtucker at zip.com.au> wrote:
>>> [...]
>>>> The other thing that I'd suggest is using 6.2p2 and the newly-added -E
>>>> option to write the debug logs to separate files, ie "ssh -E
>>>> ssh.$$.log" ...
>>>
>>> oh hang on, -E was added after 6.2p2.  You could still redirect stderr
>>> to separate log files (ie 2> ssh.$$.log) although that will contain
>>> both debug logs and stderr from the program being run.
>>>
>>> --
>>> Darren Tucker (dtucker at zip.com.au)
>>> GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4  37C9 C982 80C7 8FF4 FA69
>>>     Good judgement comes with experience. Unfortunately, the experience
>>> usually comes from bad judgement.
>>
>>
>>
>> --
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> www.numis.northwestern.edu 1-847-491-3996
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>
>
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the openssh-unix-dev mailing list