reget reput again...

Mon Dec 6 23:17:04 EST 2004

I would be in favour of both mechanisms being in place, not wanting to 
stifle progress.  reput/reget with and without checksuming.  However 
RSYNC is a better tool for a checksumed file copy IMHO.

When you start checksumming files, due to the huge IO overhead at both 
sides you are really talking about a background or batch 
filecopy/mirroring service.  I regard SFTP usage as interactive and 
therefore the user is expected to understand factors affecting 
modification of the source and target files at the time of the 
operation.  The user would not also want to have to wait the IO time for 
the file to be read while he is interactivly using his SFTP client 
program.  A simple restarted transfers facility brings SFTP into line 
with FTP without being too over the top about it.  Having a file 
checksum option that is generic would certainly be a useful feature 
which would allow the client to perform an inteligent file copy and also 
a checksumed file compare (between two files on the server system too).

Your new checksumming mechanism should be a primitive building block to 
allow the more complex decisions to be made from it result.  So a simple 
request from client to server of "Report checksum(s) for this file over 
this byte range" and maybe an abort of previous instruction, if the 
protocol permits this.  The client to server instruction maybe composed of:

File: the target file
Offset: start point to checksum from
Length: end point to checksum to (special EOF case)
Blocksize: report checksum results every (special infinity case to mean 
1 result for whole file)
Accepted/Prefered list of checksum algos / digests and their usage 
options: (maybe MD4/MD5/SHA1 is LSB truncated etc...) dont recode this, 
maybe take a look as ASN.1 syntax and numbers.  Maybe its suggested that 
a basic algo should exist (MD4 or MD5 ?) in all implementations but not 
a mandatory requirement.  In the worse case scenario the server will 
report an error "No matching algo agreed between us".

A suggestion for the checksum data coming back to the client should be 
in blocks possibiliy containing more than one checksum in each block, at 
the head of each block is the offset the first (in this block) checksum 
starts at.

<status><offset_of_result0><result0><result1><result2><result3>

There should be a special block sent (with different status to denote, 
last checksum), the last block should indicate the exact length of the 
input data that went into it.

<status><offset_of_result0><result0><length_of_input_data_to_result0>

Maybe some consideration wants to be given to how the offset and lengths 
are encoded ?  Maybe a 64bit length should just be used and it forgotten 
about.

Maybe some knowledge can be gained from researching how RSYNC -c option 
checksums files.

We still have to argue over which method should be used by default reput 
implementation.

Regards,

Darryl

Damien Miller wrote:

>I think a better option is to have an extension to take a checksum over
>a file, or an arbitrary subset of one. re(get|put) could then validate
>the file before continuing with it.
>
>This would be quite easy to do using the draft-secsh-filexfer
>protocol's vendor extension mechanism, if someone want to try to
>beat in implementing it :)
>
>If anyone is interested, please discuss your proposed design on-list.
>
>-d
>
>Darryl L. Miles wrote:
>  
>
>>Ben Lindstrom wrote (a very long time ago) :
>>
>> >The problem is in some cases the data being sent to you may be out of
>> >order (thankful no sftp server does this yet). So reget/reput without RFC
>> >clearifications can lead to bad file transfers.
>> >
>> >I'm trying to drag up in my mind which one was the problem... I believe
>> >reput is fine since the client has control over the ordering. reget is
>> >the troublesome some one without RFC clarifications stating out of order
>> >transfers are denied.
>> >
>> >if the RFC get clarified to disallow out of order transfers then a cleaned
>> >up version of this patch may not have a problem getting in.
>>
>>
>>It seems everyone body has a patch for this but it still can't quite 
>>make it into any official distribution.  Not wanting to stifle technical 
>>progress down surely the standards body have mechanism to allow new 
>>concepts to be experimentally deployed without affecting non-cooperating 
>>parties ?
>>
>>Is it really necessary to get RFC clarification on this, maybe its 
>>useful to leave as-is and have the option to execute out-of-order for in 
>>uses.
>>
>>Would it be possible to extend the channel initialisation options to 
>>negotiate a feature requesting "mandatory in-sequence execution of 
>>commands within this channel".  I'm not sure how these options are 
>>created or assigned but maybe use some OpenSSH naming space until a 
>>standard group either accepts or rejects the concept and assigns it a 
>>standard option name.
>>
>>Non-conforming servers would not understand the option and the client 
>>could then disable the reget/reput commands from use in that session.
>>
>>I do not know enough about the OpenSSH implementation to know if its 
>>possible for it to ever execute commands out of sequence with respect to 
>>the channel they are in nor the contraints this may pose to future 
>>maintainace of OpenSSH.
>>
>>To confirm the scope of the option suggested, it says nothing about any 
>>other channel nor the order in which channels are attended to within the 
>>server, this stays as-is.
>>
>>
>>RFC ?  Please Cc your reply Thanks.
>>
>>
>>    
>>
>
>
>  
>

-- 
Darryl L. Miles
M: 07968 320 114