SCP with Resume Feature

rapier rapier at psc.edu
Mon Apr 5 01:56:02 AEST 2021



On 4/3/21 8:10 PM, Demi Marie Obenour wrote:
> On 4/1/21 1:50 PM, rapier wrote:
>> Howdy all,
>>
>> I know development on SCP is discouraged but being that it's still in wide use
>> I thought I would do some work some of my users have been asking for and allow
>> SCP to resume from a partial transfer.
> 
> Would it be possible to instead reimplement SCP in terms of SFTP, and then add
> this feature to SFTP?  My understanding is that such a re-implementation is
> something many people have wanted for quite a while.
> 
> Of course, this might very well be out of scope for the project, which would
> be fine.

Honestly, after working on the SCP code I do support that idea. SCP 
really depends on an in band control protocol that can get out of sync 
and freeze the transfer process. The right thing might be to use SCP as 
a wrapper for SFTP. Mostly to maintain user experience and existent 
scripts. I may look at that depending on time and progress on other 
aspects of this project.

> I suggest using a better hash than MD5, which is considered broken.  Blake2b is
> both faster and much more secure.

I've been looking at several hashes for this: blake2, sha1, md5, and 
xxhash. MD5 was the first pass at implementation and I've since changed 
to using EVP contexts. I fully expect to go with blake2 in the end but I 
need to run more performance tests. The hashing ends up being one of the 
more expensive operations (especially on very large files (100s of MB to 
GB)) so that section is still subject to change.

I am trying to figure out how to reduce the number of hash operations. 
Let me lay it out to see if anyone has ideas (aside from using rsync - 
which I fully support).

Source: stat file, get hash, send control sequence 'C' to target.
	(Cfilemode filesize(s), hash(s), filename(s))
Target: Receive control sequence.
          If target exists
   		compute hash(t)
          If hash(t) == hash(s)
		skip file (send skip control sequence 'S' to source)
	 If (hash(t) != hash(s)
		send control sequence 'R' to source
                      (Rfilemode, filesize(t), hash(t))
Source: Receive control sequence from source
	If control == 'S'
		skip file
	If control == 'R'
		compute hash(r) of target file to filesize(t)
		if hash(r) == hash(t)
                 	file fragments match
			mode = R (for resume)	
			bytes = filesize(s) - filesize(t)
		If hash(r) != hash(t)
			fragments do not match
			mode = C (for create)
			bytes = filesize(t)
		send control to target (mode, bytes)
Target: Receive control seq from source
	if mode == R
		write bytes to temp file
		append temp file to target
	if mode == C
		write bytes to file

I think rsync only computes hashes if the modification time, files 
sizes, and other file stat data is different. I thought about doing that 
but since you can rename the target with scp that won't work.

Anyway, if anyone has an ideas on reducing the steps, hashes. etc let me 
know. I also cannot figure out why I can append directly to the target 
file. After opening the file I'd seek to the end but the bytes would 
still start at the 0th byte. I'm probably missing something in atomicio. 
Writing the temp file and then appending works and it's not taking up a 
lot of cycles but it doesn't feel like the 'right' way to do it.


More information about the openssh-unix-dev mailing list