Parallel transfers with sftp (call for testing / advice)
cyril.servant at gmail.com
Fri Apr 10 01:01:48 AEST 2020
> Le 9 avr. 2020 à 00:34, Nico Kadel-Garcia <nkadel at gmail.com> a écrit :
> On Wed, Apr 8, 2020 at 11:31 AM Cyril Servant <cyril.servant at gmail.com> wrote:
>> Hello, I'd like to share with you an evolution I made on sftp.
> It *sounds* like you should be using rparallelized rsync over xargs.
> Partial sftp or scp transfers are almost inevitable in builk transfers
> over a crowded network, and sftp does not have good support for
> "mirroring", only for copying content.
> See https://stackoverflow.com/questions/24058544/speed-up-rsync-with-simultaneous-concurrent-file-transfers
This solution is perfect for parallel sending a lot of files. But in the case of
sending one really big file, it does not improve transfer speed.
>> I'm working at CEA (Commissariat à l'énergie atomique et aux énergies
>> alternatives) in France. We have a compute cluster complex, and our customers
>> regularly need to transfer big files from and to the cluster. Each of our front
>> nodes has an outgoing bandwidth limit (let's say 1Gb/s each, generally more
>> limited by the CPU than by the network bandwidth), but the total interconnection
>> to the customer is higher (let's say 10Gb/s). Each front node shares a
>> distributed file system on an internal high bandwidth network. So the contention
>> point is the 1Gb/s limit of a connection. If the customer wants to use more than
>> 1Gb/s, he currently uses GridFTP. We want to provide a solution based on ssh to
>> our customers.
>> 2. The solution
>> I made some changes in the sftp client. The new option "-n" (defaults to 0) sets
>> the number of extra channels. There is one main ssh channel, and n extra
>> channels. The main ssh channel does everything, except the put and get commands.
>> Put and get commands are parallelized on the n extra channels. Thanks to this,
>> when the customer uses "-n 5", he can transfer his files up to 5Gb/s. There is
>> no server side change. Everything is made on the client side.
> While the option sounds useful for niche cases, I'd be leery of
> partial transfers and being compelled to replicate content to handle
> partial transfers. rsync has been very good, for years, in completing
> partial transfers.
I can fully understand this. In our case, the network is not really crowded, as
customers are generally using research / educational links. Indeed, this is
totally a niche case, but still a need for us. The main use case is putting data
you want to process into the cluster, and when the job is finished, getting the
output of the process. There is rarely the need for synchronising files, except
for the code you want to execute on the cluster, which is considered small
compared to the data. rsync is the obvious choice for synchronising the code,
but not for putting / getting huge amounts of data.
The only other ssh based tool that can speed up the transfer of one big file is
lftp, and it only works for get commands, not for put commands.
>> 3. Some details
>> Each extra channel has its own ssh channel, and its own thread. Orders are sent
>> by the main channel to the threads via a queue. When the user sends a get or put
>> request, the main channel checks what to do. If the file is small enough, one
>> simple order is added to the queue. If the file is big, the main channel writes
>> the last block of the file (in order to create a sparse file), then adds
>> multiple orders to the queue. Each of these orders are put (or get) of a chunk
>> of the file. One notable change is the progress meter (in interactive mode).
>> There is no more one progress meter for each file, now there is only one
>> progress meter which shows the name of the last dequeued file, and a total of
>> transferred bytes.
>> 4. Any thoughts ?
>> You will find the code here:
>> The branch parallel_sftp is based on the tag V_8_2_P1. There may be a lot of
>> newbie mistakes in the code, I'll gladly take any advice and criticism, I'm open
>> minded. And finally, if there is even the slightest chance for these changes to
>> be merged upstream, please show me the path.
More information about the openssh-unix-dev