[Bug 3312] New: Behavior change in OpenSSH8.1 compared to OpenSSH7.5
bugzilla-daemon at mindrot.org
bugzilla-daemon at mindrot.org
Tue May 18 00:26:46 AEST 2021
https://bugzilla.mindrot.org/show_bug.cgi?id=3312
Bug ID: 3312
Summary: Behavior change in OpenSSH8.1 compared to OpenSSH7.5
Product: Portable OpenSSH
Version: 8.1p1
Hardware: PPC
OS: AIX
Status: NEW
Severity: major
Priority: P5
Component: sshd
Assignee: unassigned-bugs at mindrot.org
Reporter: mayasha9 at in.ibm.com
Hi
I recently build and install Openssh8.1 and I found out some strange
behavior compared to OpenSSH 7.5.
I want to get your opinion on this change in behavior between 7.5 and
8.1 OpenSSH.
Problem title:
8.1 OpenSSH waits for command run in background
Problem description:
Using ssh to run a command in the background will cause the ssh session
to stay open until the command is done executing, if using OpenSSH 8.1.
For example:
ssh -n user at hostname 'nohup /tmp/dosleep > /dev/null & > /dev/null'
...if /tmp/dosleep has a sleep of 30 seconds, this session will stay
open for 30 seconds. At earlier levels of OpenSSH, the ssh session
would exit immediately since the command is being run in the
background.
Analysis so far:
This test is run using this script:
#!/bin/ksh
exec 1>/tmp/dosleep.out
sleep 10
lslpp -L
Run:
# time (ssh -n root at tele1 'nohup /tmp/dosleep > /dev/null & >
/dev/null')
...output will show it paused for 10 seconds:
real 0m11.30s
user 0m0.12s
sys 0m0.00s
...whereas using OpenSSH 7.5 version of sshd will show more like "real
0m0.94s" as the ssh session is allowed to exit immediately.
Redirecting stderr to null will allow it to exit immediately even when
using 8.X:
# time (ssh -n root at tele1 'nohup /tmp/dosleep > /dev/null 2>&1 & >
/dev/null')
real 0m0.94s
user 0m0.12s
sys 0m0.00s
sshd debug will show that we pause here:
debug2: notify_done: reading
debug2: channel 0: read<=0 rfd 13 len 0
debug2: channel 0: read failed
debug2: channel 0: chan_shutdown_read (i0 o3 sock -1 wfd 13 efd 15
[read])
debug2: channel 0: input open -> drain
debug2: channel 0: ibuf_empty delayed efd 15/(0)
<pause here>
In file, channels.c (OpenSSH8.1), If I do the following change,
OpenSSH8.1 behaves same as what we saw in ssh 7.5 starts working.
if ((len = sshbuf_len(c->input)) == 0) {
if (c->istate == CHAN_INPUT_WAIT_DRAIN) {
- /*
- * input-buffer is empty and read-socket
shutdown:
- * tell peer, that we will not send more data:
- * send IEOF.
- * hack for extended data: delay EOF if EFD
still
- * in use.
- */
- if (CHANNEL_EFD_INPUT_ACTIVE(c))
- debug2("channel %d: "
- "ibuf_empty delayed efd %d/(%zu)",
- c->self, c->efd,
sshbuf_len(c->extended));
- else
- chan_ibuf_empty(ssh, c);
+ chan_ibuf_empty(ssh, c);
}
return;
}
It seems to think we have an input - an 'extended file descriptor /
EFD" - active still that we are waiting on input from. efd 15 is
opened for the stderr of the command.
This might mean we still think we have input to read from stderr -
since we see that, by redirecting stderr to stdout, the problem does
not happen.
It says this is a 'hack for extended data', which is a bit worrying.
This happens whether the program being run is a ksh shell script or a
compiled C program.
Testing by adding a circumvention and calling chan_ibuf_empty even if
we do hit that CHANNEL_EFD_INPUT_ACTIVE condition will avoid the
problem, although that would undo the 'hack' that is in place.
It would appear the logic is not correct that leads us to wait on this
file descriptor since we are running the command with nohup and in the
background. Even if something in the script were to write to stderr
after our sleep has happened, we shouldn't be displaying that on the
client side, since the nohup ought to prevent things from writing to
the screen.
But then, if I change my test script to this:
#!/bin/ksh
exec 1>/tmp/dosleep.out
sleep 10
lslpp -L
echo "this is going to stderr" >&2
...and run this:
# time (ssh -n root at tele1 'nohup /tmp/dosleep >/dev/null & >
/dev/null')
...the client side actually sees what the script is writing to stderr:
<10 second pause happens here, then we see: >
this is going to stderr
real 0m11.47s
user 0m0.12s
sys 0m0.00s
I wouldn't have expected we'd see that since we are nohup'ing it.
Testing the same way with 7.5 sshd, we do not see the 'this is going to
stderr' output.
What do you think, is this change in behavior a defect?
Thanks
Mayank
--
You are receiving this mail because:
You are watching the assignee of the bug.
More information about the openssh-bugs
mailing list