SCP with Resume Feature
rapier
rapier at psc.edu
Fri Apr 9 05:21:36 AEST 2021
On 4/7/21 10:41 AM, Ron Frederick wrote:
> That said, is the SCP implementation in OpenSSH currently doing any file-level parallelization? I wouldn’t expect it to, so I’m not sure that would explain the performance difference. If I had to guess, it’s more likely due to the fact that there’s a single round-trip with SCP for each file transfer, whereas SFTP involves separate requests to do an open(), read(), stat(), etc. each of which has its own round-trip. Some of those (such as the read() calls) are parallelized, but you still have to pay for the open() before beginning the reads, and possibly for other things like stat() when preserving attributes.
>
No parallelization at all. It's something I thought about but it's
something I'll have to come back to when I have time. There are other
deliverables for this project I need to focus on. As for the number of
RTs - there are a couple of message round trips but nothing all that
much. The resume feature increases the number of RTs but it's still faster.
I absolutely agree with Damien about the pipeline stalling being the
major factor. Anyway, I've been looking at learning more about
pipelining. :)
In some cases there *might* be an issue with hitting the outstanding
message request limit but that's not what's happening here. I really do
want to take a closer look at this - especially if SCP is going to
default to the SFTP protocol soon. In the high performance computing
community we do have faster transport tools like GridFTP and Aspera but
they have some serious barriers to entry for a lot of users. SCP is
still widely used for transferring large data sets (people moving TBs of
data via SCP isn't uncommon where I work) so performance in those
environments is a concern of mine.
More information about the openssh-unix-dev
mailing list