Failure to Launch (was override -q option)

Laurence Marks laurence.marks at gmail.com
Sat Jul 20 22:07:08 EST 2013


Attached is the very verbose ssh output. Just to be perverse, this time two
nodes lost connectivity. The only thing I see is lines saying that the two
connections are lost, although being honest I have no idea what everything
else means. For reference, 8 ssh cinnections were being made at the same
time for a 8x8mpi task.

N.B., since the OS I am using does not have rsh, I am currently using the
openmpi mpirun to replace ssh as the launcher while still using the Intel
impi for communications. While a gross hack, this seems to be reliable
indicating that the issue really is ssh related, but since the problem only
occurs for 0.1-0.2 % of the connections I need to let 10 jobs run for a day
or so more before I can be certain.

--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the openssh-unix-dev mailing list