SFTP and outstanding requests

Chris Rapier rapier at psc.edu
Sat Apr 28 01:17:33 EST 2007


> Did you measure the memory impact of increasing it? I'm not averse to
> cranking it up if it improves performance and doesn't have too much of an
> effect on memory use.

Not as of yet. If I have a chance I'll try and get to that today. 
Personally, I don't think it will have an impact on memory in the 
current implementation because I'm not thinking you'll ever have that 
many default block sizes in transit. I'll try getting some memory stats 
using multiple block sizes and request values.

Obviously I'm interested in this because of the impact on the HPN work 
I'm doing so I'll also be looking at that. If the memory impact is too 
high in heavily used servers I'll have to figure out some other 
methodology.

>> What I am curious about, and maybe you can help point me to the right
>> portion of the code, is what happens when transferring multiple files
>> in SFTP (SCP as well). If you look at outstanding data graphs (derived
>> by tcptrace from a tcpdump) it seems that between each file there is
>> something happening that causes the network to drain completely and
>> then there is a 2RTT pause before the next file gets sent out. I can
>> put a copy of the data somewhere if you want to look at it. If I can
>> get a better understanding of what is happening there I can at least
>> explain to my users why they should do a tar pipe if they have many
>> small files.
> 
> There is a pipeline stall between each file because of the current client
> implementation. To fix this, sftp-client.c:do_(up|down)load really
> needs to be modified to accept a vector of files rather than a single
> file at a time.

Okay, this makes sense and I'll see if there is anything I can do there. 
  Would there be any benefit to prepping multiple files as a single data 
stream and using a control channel to provide offsets and files data to 
rebuild the files on the far end? Mostly I'm just looking at keeping the 
data pipe as full as possible.

Also, I see the same sort of behaviour in scp as well which is why I was 
looking at the ssh code. Is this mostly just due to similar methods 
being used in both clients?

> The other point where we waste round-trips in sftp is globbing. There
> are lots of round-trips there and (worse) the glob implementation we
> use throws away the Attrib data, which we then have to refetch (another
> round-trip per file).

Interesting... I might look at this first. If I can reduce the number of 
RTTs even by 1 that would make a big difference in performance.

> A while back, I posted these as part of a list of TODOs for sftp. It's
> a pity I wasn't more organised earlier in the year because they would
> have been excellent projects for Google's Summer of Code.

Trust me I understand this. I had been planning on hiring a student this 
summer but I've had so much else going on that I just forgot about it 
until yesterday. Of course, I'll be gone for three weeks and by the time 
I get back all the students will have left :\


More information about the openssh-unix-dev mailing list