reget reput again...
Darryl L. Miles
darryl at netbauds.net
Mon Dec 6 23:17:04 EST 2004
I would be in favour of both mechanisms being in place, not wanting to
stifle progress. reput/reget with and without checksuming. However
RSYNC is a better tool for a checksumed file copy IMHO.
When you start checksumming files, due to the huge IO overhead at both
sides you are really talking about a background or batch
filecopy/mirroring service. I regard SFTP usage as interactive and
therefore the user is expected to understand factors affecting
modification of the source and target files at the time of the
operation. The user would not also want to have to wait the IO time for
the file to be read while he is interactivly using his SFTP client
program. A simple restarted transfers facility brings SFTP into line
with FTP without being too over the top about it. Having a file
checksum option that is generic would certainly be a useful feature
which would allow the client to perform an inteligent file copy and also
a checksumed file compare (between two files on the server system too).
Your new checksumming mechanism should be a primitive building block to
allow the more complex decisions to be made from it result. So a simple
request from client to server of "Report checksum(s) for this file over
this byte range" and maybe an abort of previous instruction, if the
protocol permits this. The client to server instruction maybe composed of:
File: the target file
Offset: start point to checksum from
Length: end point to checksum to (special EOF case)
Blocksize: report checksum results every (special infinity case to mean
1 result for whole file)
Accepted/Prefered list of checksum algos / digests and their usage
options: (maybe MD4/MD5/SHA1 is LSB truncated etc...) dont recode this,
maybe take a look as ASN.1 syntax and numbers. Maybe its suggested that
a basic algo should exist (MD4 or MD5 ?) in all implementations but not
a mandatory requirement. In the worse case scenario the server will
report an error "No matching algo agreed between us".
A suggestion for the checksum data coming back to the client should be
in blocks possibiliy containing more than one checksum in each block, at
the head of each block is the offset the first (in this block) checksum
starts at.
<status><offset_of_result0><result0><result1><result2><result3>
There should be a special block sent (with different status to denote,
last checksum), the last block should indicate the exact length of the
input data that went into it.
<status><offset_of_result0><result0><length_of_input_data_to_result0>
Maybe some consideration wants to be given to how the offset and lengths
are encoded ? Maybe a 64bit length should just be used and it forgotten
about.
Maybe some knowledge can be gained from researching how RSYNC -c option
checksums files.
We still have to argue over which method should be used by default reput
implementation.
Regards,
Darryl
Damien Miller wrote:
>I think a better option is to have an extension to take a checksum over
>a file, or an arbitrary subset of one. re(get|put) could then validate
>the file before continuing with it.
>
>This would be quite easy to do using the draft-secsh-filexfer
>protocol's vendor extension mechanism, if someone want to try to
>beat in implementing it :)
>
>If anyone is interested, please discuss your proposed design on-list.
>
>-d
>
>Darryl L. Miles wrote:
>
>
>>Ben Lindstrom wrote (a very long time ago) :
>>
>> >The problem is in some cases the data being sent to you may be out of
>> >order (thankful no sftp server does this yet). So reget/reput without RFC
>> >clearifications can lead to bad file transfers.
>> >
>> >I'm trying to drag up in my mind which one was the problem... I believe
>> >reput is fine since the client has control over the ordering. reget is
>> >the troublesome some one without RFC clarifications stating out of order
>> >transfers are denied.
>> >
>> >if the RFC get clarified to disallow out of order transfers then a cleaned
>> >up version of this patch may not have a problem getting in.
>>
>>
>>It seems everyone body has a patch for this but it still can't quite
>>make it into any official distribution. Not wanting to stifle technical
>>progress down surely the standards body have mechanism to allow new
>>concepts to be experimentally deployed without affecting non-cooperating
>>parties ?
>>
>>Is it really necessary to get RFC clarification on this, maybe its
>>useful to leave as-is and have the option to execute out-of-order for in
>>uses.
>>
>>Would it be possible to extend the channel initialisation options to
>>negotiate a feature requesting "mandatory in-sequence execution of
>>commands within this channel". I'm not sure how these options are
>>created or assigned but maybe use some OpenSSH naming space until a
>>standard group either accepts or rejects the concept and assigns it a
>>standard option name.
>>
>>Non-conforming servers would not understand the option and the client
>>could then disable the reget/reput commands from use in that session.
>>
>>I do not know enough about the OpenSSH implementation to know if its
>>possible for it to ever execute commands out of sequence with respect to
>>the channel they are in nor the contraints this may pose to future
>>maintainace of OpenSSH.
>>
>>To confirm the scope of the option suggested, it says nothing about any
>>other channel nor the order in which channels are attended to within the
>>server, this stays as-is.
>>
>>
>>RFC ? Please Cc your reply Thanks.
>>
>>
>>
>>
>
>
>
>
--
Darryl L. Miles
M: 07968 320 114
More information about the openssh-unix-dev
mailing list