Failure to Launch (was override -q option)
laurence.marks at gmail.com
Sat Jul 20 22:07:08 EST 2013
Attached is the very verbose ssh output. Just to be perverse, this time two
nodes lost connectivity. The only thing I see is lines saying that the two
connections are lost, although being honest I have no idea what everything
else means. For reference, 8 ssh cinnections were being made at the same
time for a 8x8mpi task.
N.B., since the OS I am using does not have rsh, I am currently using the
openmpi mpirun to replace ssh as the launcher while still using the Intel
impi for communications. While a gross hack, this seems to be reliable
indicating that the issue really is ssh related, but since the problem only
occurs for 0.1-0.2 % of the connections I need to let 10 jobs run for a day
or so more before I can be certain.
Professor Laurence Marks
Department of Materials Science and Engineering
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
More information about the openssh-unix-dev