SCP with Resume Feature
rapier
rapier at psc.edu
Mon Apr 5 01:56:02 AEST 2021
On 4/3/21 8:10 PM, Demi Marie Obenour wrote:
> On 4/1/21 1:50 PM, rapier wrote:
>> Howdy all,
>>
>> I know development on SCP is discouraged but being that it's still in wide use
>> I thought I would do some work some of my users have been asking for and allow
>> SCP to resume from a partial transfer.
>
> Would it be possible to instead reimplement SCP in terms of SFTP, and then add
> this feature to SFTP? My understanding is that such a re-implementation is
> something many people have wanted for quite a while.
>
> Of course, this might very well be out of scope for the project, which would
> be fine.
Honestly, after working on the SCP code I do support that idea. SCP
really depends on an in band control protocol that can get out of sync
and freeze the transfer process. The right thing might be to use SCP as
a wrapper for SFTP. Mostly to maintain user experience and existent
scripts. I may look at that depending on time and progress on other
aspects of this project.
> I suggest using a better hash than MD5, which is considered broken. Blake2b is
> both faster and much more secure.
I've been looking at several hashes for this: blake2, sha1, md5, and
xxhash. MD5 was the first pass at implementation and I've since changed
to using EVP contexts. I fully expect to go with blake2 in the end but I
need to run more performance tests. The hashing ends up being one of the
more expensive operations (especially on very large files (100s of MB to
GB)) so that section is still subject to change.
I am trying to figure out how to reduce the number of hash operations.
Let me lay it out to see if anyone has ideas (aside from using rsync -
which I fully support).
Source: stat file, get hash, send control sequence 'C' to target.
(Cfilemode filesize(s), hash(s), filename(s))
Target: Receive control sequence.
If target exists
compute hash(t)
If hash(t) == hash(s)
skip file (send skip control sequence 'S' to source)
If (hash(t) != hash(s)
send control sequence 'R' to source
(Rfilemode, filesize(t), hash(t))
Source: Receive control sequence from source
If control == 'S'
skip file
If control == 'R'
compute hash(r) of target file to filesize(t)
if hash(r) == hash(t)
file fragments match
mode = R (for resume)
bytes = filesize(s) - filesize(t)
If hash(r) != hash(t)
fragments do not match
mode = C (for create)
bytes = filesize(t)
send control to target (mode, bytes)
Target: Receive control seq from source
if mode == R
write bytes to temp file
append temp file to target
if mode == C
write bytes to file
I think rsync only computes hashes if the modification time, files
sizes, and other file stat data is different. I thought about doing that
but since you can rename the target with scp that won't work.
Anyway, if anyone has an ideas on reducing the steps, hashes. etc let me
know. I also cannot figure out why I can append directly to the target
file. After opening the file I'd seek to the end but the bytes would
still start at the 0th byte. I'm probably missing something in atomicio.
Writing the temp file and then appending works and it's not taking up a
lot of cycles but it doesn't feel like the 'right' way to do it.
More information about the openssh-unix-dev
mailing list