Race condition when using ControlMaster=auto with simultaneous connections

Thu Sep 1 13:37:12 AEST 2022

On 8/31/22 09:24, Baptiste Jonglez wrote:
> Hello,
> 
> I'm trying to multiplex many simultaneous SSH connections through a single
> master connection, and I'm hitting a race condition while doing this.
> This is not a bug; I'm either hitting a limit in the design of OpenSSH or
> misusing it.
> 
> The use-case is to use Ansible to configure many hosts simultaneously,
> while all connections need to go through a single "SSH bastion" via ProxyJump.
> For efficiency and to avoid hitting MaxStartups limits, I would like to
> use a control master for the connection to the bastion, via the following
> client configuration:
> 
>     Host bastion.example.com
>       ControlMaster auto
>       ControlPath /dev/shm/ssh-%h
>       ControlPersist 30
> 
>     Host !bastion.example.com *.example.com
>       ProxyJump bastion.example.com
> 
> However, this does not work when making simultaneous connections: all SSH
> connections create a new, separate connection to the bastion.  Here is a
> simple way to reproduce:
> 
>     $ for i in {1..3}; do ssh myhost.example.com "sleep 1" & done
>     ControlSocket /dev/shm/ssh-bastion.example.com already exists, disabling multiplexing
>     ControlSocket /dev/shm/ssh-bastion.example.com already exists, disabling multiplexing
> 
> What happens is the following:
> 
> 1) each SSH process tries to connect to the control socket and fails
>    (this is expected, the control socket is not yet bound)
> 
> 2) each SSH process then creates a new SSH connection
> 
> 3) once connected, each process tries to bind to the control socket
> 
> 4a) one process successfully binds the control socket
> 4b) all other processes fail to bind the control socket (error message above)
> 
> 5) in both cases, each process is now using its own separate SSH connection to the bastion
> 
> The window for the race condition is between 1) and 4), so it's rather
> large: it includes the time to establish a new SSH connection.
> 
> I believe that taking a lock between steps 1) and 4) could solve the issue:
> 
> 1.1) each process tries to take an exclusive lock related to the control socket
> 1.1a) one process gets the lock and can continue creating a SSH connection
> 1.1b) all other processes wait on the lock; when the lock is released, they
>       go back to step 1) to connect to the control socket
> 
> 4.1) once the control socket has been bound, the "lucky process" releases the lock
> 
> Does it make sense?  Would the project accept a patch implementing this as
> an additional option?

Not sure if this is related, but I would like to have an option to *only* use the
control socket.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xB288B55FFF9C22C1.asc
Type: application/pgp-keys
Size: 4885 bytes
Desc: OpenPGP public key
URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20220831/b9be508f/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20220831/b9be508f/attachment.asc>