Parallel transfers with sftp (call for testing / advice)

Sat May 9 09:25:04 AEST 2020

Le 06/05/2020 à 06:21, David Newall a écrit :
> Did anything happen after 
> https://daniel.haxx.se/blog/2010/12/08/making-sftp-transfers-fast/? I 
> suspect it did, because we do now allow multiple outstanding packets, 
> as well as specifying the buffer size.
>
> Daniel explained the process that SFTP uses quite clearly, such that 
> I'm not sure why re-assembly is an issue.  He explained that each 
> transfer already specifies the offset within the file.  It seems 
> reasonable that multiple writers would just each write to the same 
> file at their various different offsets.  It relies on the target 
> supporting sparse files, but supercomputers only ever run Linux ;-) 
> which does do the right thing.
You are right, reassembly is not an issue, as long as you have sparse 
files support, which is our case with Linux :)
>
> The original patch which we are discussing seemed more concerned about 
> being able to connect to multiple IP addresses, rather than multiple 
> connections between the same pair of machines.  The issue, as I 
> understand, is that the supercomputer has slow NICs, so adding 
> multiple NICs allows greater network bandwidth.  This, I think, is the 
> problem to be solved; not re-assembly, just sending to what appear to 
> be multiple different hosts (i.e. IP addresses.)

No, the primary goal of the patch is to enable to do that between two 
endpoints with one NIC per endpoint, the NIC being 10GE or faster.

Here is an example with roughly the same results for a single 
destination/IP :

# With the patched sftp and 1+10 parallel SSH connections

[me at france openssh-portable]$ ./sftp -n 10 germany0
Connected main channel to germany0 (1.2.3.96).
Connected channel 1 to germany0 (1.2.3.96).
Connected channel 2 to germany0 (1.2.3.96).
Connected channel 3 to germany0 (1.2.3.96).
Connected channel 4 to germany0 (1.2.3.96).
Connected channel 5 to germany0 (1.2.3.96).
Connected channel 6 to germany0 (1.2.3.96).
Connected channel 7 to germany0 (1.2.3.96).
Connected channel 8 to germany0 (1.2.3.96).
Connected channel 9 to germany0 (1.2.3.96).
Connected channel 10 to germany0 (1.2.3.96).
sftp>  get 5g 5g.bis
Fetching /files/5g to 5g.bis
/files/5g 100% 5120MB 706.7MB/s   00:07
sftp> put 5g.bis

Uploading 5g.bis to /files/5g.bis
5g.bis 100% 5120MB 664.0MB/s   00:07
sftp>

# WIth the legacy sftp :

[me at france openssh-portable]$ sftp germany0

sftp> get 5g 5g.bis
Fetching /files/5g to 5g.bis
/p/scratch/chpsadm/files/5g 100% 5120MB  82.8MB/s   01:01
sftp> put 5g.bis
Uploading 5g.bis to /files/5g.bis
5g.bis 100% 5120MB  67.0MB/s   01:16
sftp>

# With scp :

[me at france openssh-portable]$ scp 5g germany0:/files/5g.bis
5g 100% 5120MB  83.1MB/s   01:01

#With rsync :

[me at france openssh-portable]$ rsync -v 5g germany0:/files/5g.bis

5g

sent 5,370,019,908 bytes  received 35 bytes  85,920,319.09 bytes/sec
total size is 5,368,709,120  speedup is 1.00

>
> I was curious to know why a supercomputer would have issues receiving 
> at some high-bandwidth via a single NIC, while the sending machine has 
> no such performance issue; but that's an aside.

Supercomputers commonly offer multiple "login nodes" and a generic DNS 
entry to connect to one of them randomly : the DNS entry is associated 
to multiple IP adresses and the client (dns resolver) selects one of them.

Other DNS entries may exist to address a particular login node, in case 
you want to go at a particular place.

When used with Cyril 's patched sftp, this logic makes that you are 
targeting multiple hosts automatically if you use the generic DNS entry 
(the first perf results of Cyril). If you select a particular host DNS 
entry (like in this exampe), then you will only contact that single host 
only.

On supercomputers, files are commonly stored on distributed file systems 
like NFS, Lustre, GPFS, ... In case your transfers target one of those 
types of file systems, you can use multiple hosts as destinations 
without any issues. You just need to ensure that the sftp sent/written 
blocks are properly sized to avoid any overwritting of some targets by 
others because of their file systems client implementations and the 
asynchronism of the various page cache flushes on the involved nodes. 
That is what is done in the patch, as explained by Cyril in a previous 
message, the block size used for parallel transfers was selected with 
that potential issue in mind.

Regards,

Matthieu

>
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev