Parallel transfers with sftp (call for testing / advice)

Nico Kadel-Garcia nkadel at gmail.com
Wed May 6 11:16:38 AEST 2020


On Tue, May 5, 2020 at 4:31 AM Peter Stuge <peter at stuge.se> wrote:
>
> Matthieu Hautreux wrote:
> > The change proposed by Cyril in sftp is a very pragmatic approach to
> > deal with parallelism at the file transfer level. It leverages the
> > already existing sftp protocol and its capability to write/read file
> > content at specified offsets. This enables to speed up sftp transfers
> > significantly by parallelizing the SSH channels used for large
> > transfers. This improvement is performed only by modifying the sftp
> > client, which is a very small modification compared to the openssh
> > codebase. The modification is not too complicated to review and validate
> > (I did it) and does not change the default behavior of the cli.
>
> I think you make a compelling argument. I admit that I haven't
> reviewed the patch, even though that is what matters the most.
>
> I guess that noone really minds ways to make SFTP scale, but ever since
> the patch was proposed I have been thinking that the paralell channel
> approach is likely to introduce a whole load of not very clean error
> conditions regarding reassembly, which need to be handled sensibly both
> within the sftp client and on the interface to outside/calling processes.
> Can you or Cyril say something about this?

I find it an unnecessary feature given the possibilities of
out-of-band parallelism with multiple scp sessions transmitting
diferent manifests of files, of sftp to do the same thing, and of
tools like rsync to do it more efficiently by avoiding replication of
previously transmitted data and re-connection to complete partial
transmisions. It sounds like a bad case of "here, let me do this at a
different level of the stack" that is not normally necessary and has
already been done more completely and efficiently by other tools.

> And another thought - if the proposed patch and/or method indeed will not
> go anywhere, would it still be helpful for you if the sftp client would
> only expose the file offset functionality? That way, the complexity of
> reassembly and the associated error handling doesn't enter into OpenSSH.

Re-assembly, eror handling, and delivery verification were done by
rsync ages ago. It really seems like re-inventing the wheel.


More information about the openssh-unix-dev mailing list