Parallel transfers with sftp (call for testing / advice)

Cyril Servant cyril.servant at gmail.com
Thu Apr 9 01:30:43 AEST 2020


Hello, I'd like to share with you an evolution I made on sftp.

1. The need

I'm working at CEA (Commissariat à l'énergie atomique et aux énergies
alternatives) in France. We have a compute cluster complex, and our customers
regularly need to transfer big files from and to the cluster. Each of our front
nodes has an outgoing bandwidth limit (let's say 1Gb/s each, generally more
limited by the CPU than by the network bandwidth), but the total interconnection
to the customer is higher (let's say 10Gb/s). Each front node shares a
distributed file system on an internal high bandwidth network. So the contention
point is the 1Gb/s limit of a connection. If the customer wants to use more than
1Gb/s, he currently uses GridFTP. We want to provide a solution based on ssh to
our customers.

2. The solution

I made some changes in the sftp client. The new option "-n" (defaults to 0) sets
the number of extra channels. There is one main ssh channel, and n extra
channels. The main ssh channel does everything, except the put and get commands.
Put and get commands are parallelized on the n extra channels. Thanks to this,
when the customer uses "-n 5", he can transfer his files up to 5Gb/s. There is
no server side change. Everything is made on the client side.

3. Some details

Each extra channel has its own ssh channel, and its own thread. Orders are sent
by the main channel to the threads via a queue. When the user sends a get or put
request, the main channel checks what to do. If the file is small enough, one
simple order is added to the queue. If the file is big, the main channel writes
the last block of the file (in order to create a sparse file), then adds
multiple orders to the queue. Each of these orders are put (or get) of a chunk
of the file. One notable change is the progress meter (in interactive mode).
There is no more one progress meter for each file, now there is only one
progress meter which shows the name of the last dequeued file, and a total of
transferred bytes.

4. Any thoughts ?

You will find the code here:
    https://github.com/cea-hpc/openssh-portable/tree/parallel_sftp
The branch parallel_sftp is based on the tag V_8_2_P1. There may be a lot of
newbie mistakes in the code, I'll gladly take any advice and criticism, I'm open
minded. And finally, if there is even the slightest chance for these changes to
be merged upstream, please show me the path.

Thank you,
-- 
Cyril


More information about the openssh-unix-dev mailing list