Parallel transfers with sftp (call for testing / advice)

Tue May 5 20:55:10 AEST 2020

Peter Stuge wrote:
> 
> Matthieu Hautreux wrote:
>> The change proposed by Cyril in sftp is a very pragmatic approach to 
>> deal with parallelism at the file transfer level. It leverages the 
>> already existing sftp protocol and its capability to write/read file 
>> content at specified offsets. This enables to speed up sftp transfers 
>> significantly by parallelizing the SSH channels used for large 
>> transfers. This improvement is performed only by modifying the sftp 
>> client, which is a very small modification compared to the openssh 
>> codebase. The modification is not too complicated to review and validate 
>> (I did it) and does not change the default behavior of the cli.
> 
> I think you make a compelling argument. I admit that I haven't
> reviewed the patch, even though that is what matters the most.

If you want to review the code, here is a direct link to the patch:
https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1

> I guess that noone really minds ways to make SFTP scale, but ever since
> the patch was proposed I have been thinking that the paralell channel
> approach is likely to introduce a whole load of not very clean error
> conditions regarding reassembly, which need to be handled sensibly both
> within the sftp client and on the interface to outside/calling processes.
> Can you or Cyril say something about this?

Indeed, reassembly must be handled properly. Depending on the filesystem you're
writing to, the block size can vary. Having a "power of 2 bytes" chunk is the
starting point. Performance-wise, our storage experts asked us to use 2MB or
more chunks. I chose the arbitrary value of 64MB chunks because I don't see any
advantage of using smaller chunks. After all, we're talking about using
parallel transfers on high bandwidth links. Of course this value can be
discussed and easily changed (it's the base_chunk_size variable). In order to
be more exact, a chunk size will be a multiple of base_chunk_size, so a 4GB
file transferred with 4 channels will be cut in 4 1GB chunks.

The main source of errors during reassembly is if the copy_buffer_len (-B
option) is set to a "non power of 2" value. This will lead to writes sitting
partially on 2 blocks, and probably corrupt the file. Writing simultaneously
the start and the end of a block on 2 different NFS clients is a really bad
idea. That's why I issue a warning if -n > 0 and -B is not a power of 2.

Concerning error handling within threads, if a thread encounters a blocking
error, the other threads will end their current chunk copy, then stop doing
anything.

> And another thought - if the proposed patch and/or method indeed will not
> go anywhere, would it still be helpful for you if the sftp client would
> only expose the file offset functionality? That way, the complexity of
> reassembly and the associated error handling doesn't enter into OpenSSH.

There is no server side change in this patch, so I don't think we can talk
about conplexity of reassembly. Once base_chunk_size is set and a warning or an
error is raised if copy_buffer_len is not a power of 2, there is nothing more
than during a reput or a reget.

Of course, exposing the file offset functionality would help creating a new
software on top of the sftp client, but at the cost of simplicity. And if you
do this, in the case of "get", the new software will need to send a lot of "ls"
commands in order to be aware of the directory structure, the file sizes in
order to correctly split transfers… I feel like it's reinventing the wheel.
-- 
Cyril