Failure to Launch (was override -q option)
Laurence Marks
L-marks at northwestern.edu
Mon Jul 22 23:11:40 EST 2013
Murphy's law. I ran 345 repeats of a shorter mpi code and did not find
any issue. I am trying a loop of two different mpi codes, and to date
nothing.
It may be that I will need to run a number of similar jobs in parallel
which is tricky to setup reliably with a queueing system. One
question: is there any conceivable way if 10-20 tasks are all trying
to connect via ssh at the same time that there can be an issue? They
would all be accessing the same $HOME/.ssh directory, but different
syslog files. (In case it matters, the compute nodes are diskless.)
On Sun, Jul 21, 2013 at 8:45 AM, Laurence Marks
<L-marks at northwestern.edu> wrote:
> Thanks. After a bit of tweaking (including finding where strace was
> hidden on the compute nodes) I am running 2000 repeats of the shortest
> of the three mpi tasks. Hopefully it will hang....
>
> On Sun, Jul 21, 2013 at 2:52 AM, Darren Tucker <dtucker at zip.com.au> wrote:
>> On Sun, Jul 21, 2013 at 5:48 PM, Darren Tucker <dtucker at zip.com.au> wrote:
>> [...]
>>> The other thing that I'd suggest is using 6.2p2 and the newly-added -E
>>> option to write the debug logs to separate files, ie "ssh -E
>>> ssh.$$.log" ...
>>
>> oh hang on, -E was added after 6.2p2. You could still redirect stderr
>> to separate log files (ie 2> ssh.$$.log) although that will contain
>> both debug logs and stderr from the program being run.
>>
>> --
>> Darren Tucker (dtucker at zip.com.au)
>> GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
>> Good judgement comes with experience. Unfortunately, the experience
>> usually comes from bad judgement.
>
>
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi
More information about the openssh-unix-dev
mailing list