Parallel transfers with sftp (call for testing / advice)
Matthieu Hautreux
matthieu.hautreux at cea.fr
Sat May 9 09:25:04 AEST 2020
Le 06/05/2020 à 06:21, David Newall a écrit :
> Did anything happen after
> https://daniel.haxx.se/blog/2010/12/08/making-sftp-transfers-fast/? I
> suspect it did, because we do now allow multiple outstanding packets,
> as well as specifying the buffer size.
>
> Daniel explained the process that SFTP uses quite clearly, such that
> I'm not sure why re-assembly is an issue. He explained that each
> transfer already specifies the offset within the file. It seems
> reasonable that multiple writers would just each write to the same
> file at their various different offsets. It relies on the target
> supporting sparse files, but supercomputers only ever run Linux ;-)
> which does do the right thing.
You are right, reassembly is not an issue, as long as you have sparse
files support, which is our case with Linux :)
>
> The original patch which we are discussing seemed more concerned about
> being able to connect to multiple IP addresses, rather than multiple
> connections between the same pair of machines. The issue, as I
> understand, is that the supercomputer has slow NICs, so adding
> multiple NICs allows greater network bandwidth. This, I think, is the
> problem to be solved; not re-assembly, just sending to what appear to
> be multiple different hosts (i.e. IP addresses.)
No, the primary goal of the patch is to enable to do that between two
endpoints with one NIC per endpoint, the NIC being 10GE or faster.
Here is an example with roughly the same results for a single
destination/IP :
# With the patched sftp and 1+10 parallel SSH connections
[me at france openssh-portable]$ ./sftp -n 10 germany0
Connected main channel to germany0 (1.2.3.96).
Connected channel 1 to germany0 (1.2.3.96).
Connected channel 2 to germany0 (1.2.3.96).
Connected channel 3 to germany0 (1.2.3.96).
Connected channel 4 to germany0 (1.2.3.96).
Connected channel 5 to germany0 (1.2.3.96).
Connected channel 6 to germany0 (1.2.3.96).
Connected channel 7 to germany0 (1.2.3.96).
Connected channel 8 to germany0 (1.2.3.96).
Connected channel 9 to germany0 (1.2.3.96).
Connected channel 10 to germany0 (1.2.3.96).
sftp> get 5g 5g.bis
Fetching /files/5g to 5g.bis
/files/5g 100% 5120MB 706.7MB/s 00:07
sftp> put 5g.bis
Uploading 5g.bis to /files/5g.bis
5g.bis 100% 5120MB 664.0MB/s 00:07
sftp>
# WIth the legacy sftp :
[me at france openssh-portable]$ sftp germany0
sftp> get 5g 5g.bis
Fetching /files/5g to 5g.bis
/p/scratch/chpsadm/files/5g 100% 5120MB 82.8MB/s 01:01
sftp> put 5g.bis
Uploading 5g.bis to /files/5g.bis
5g.bis 100% 5120MB 67.0MB/s 01:16
sftp>
# With scp :
[me at france openssh-portable]$ scp 5g germany0:/files/5g.bis
5g 100% 5120MB 83.1MB/s 01:01
#With rsync :
[me at france openssh-portable]$ rsync -v 5g germany0:/files/5g.bis
5g
sent 5,370,019,908 bytes received 35 bytes 85,920,319.09 bytes/sec
total size is 5,368,709,120 speedup is 1.00
>
> I was curious to know why a supercomputer would have issues receiving
> at some high-bandwidth via a single NIC, while the sending machine has
> no such performance issue; but that's an aside.
Supercomputers commonly offer multiple "login nodes" and a generic DNS
entry to connect to one of them randomly : the DNS entry is associated
to multiple IP adresses and the client (dns resolver) selects one of them.
Other DNS entries may exist to address a particular login node, in case
you want to go at a particular place.
When used with Cyril 's patched sftp, this logic makes that you are
targeting multiple hosts automatically if you use the generic DNS entry
(the first perf results of Cyril). If you select a particular host DNS
entry (like in this exampe), then you will only contact that single host
only.
On supercomputers, files are commonly stored on distributed file systems
like NFS, Lustre, GPFS, ... In case your transfers target one of those
types of file systems, you can use multiple hosts as destinations
without any issues. You just need to ensure that the sftp sent/written
blocks are properly sized to avoid any overwritting of some targets by
others because of their file systems client implementations and the
asynchronism of the various page cache flushes on the involved nodes.
That is what is done in the patch, as explained by Cyril in a previous
message, the block size used for parallel transfers was selected with
that potential issue in mind.
Regards,
Matthieu
>
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
More information about the openssh-unix-dev
mailing list