sftp reget/reput

Wed Sep 17 14:03:13 EST 2003

It's a mighty inefficient codepath that literally reads data out of 
order and sends it such; disk seek times are deadly.  That being said, 
simply implement a cache that handles out of order transactions and only 
writes to disk complete windows of data.  This does mean memory usage 
can grow in case of a small missing block, but certainly we can control 
that by monitoring our number of outstanding requests and failing to 
issue more when the server obstinately refuses to give us one particular 
entry.

This is, of course, directly analogous to a TCP Window.

--Dan

Markus Friedl wrote:

>we could modify the protocol and implement
>rolling checksums like Niels Provos suggests:
>
>	MD5_CTX ctx1, ctx2;
>
>	MD5_Init(&ctx1);
>
>	while new page in data
>	  MD5_Update(&ctx1, newpage, pagesize)
>	  ctx2 = ctx1;
>	  MD5_Final(digest, &ctx2)
>	  if (compare with remote not equal)
>	     break;
>	end while
>
>	continue data transfer.
>
>On Wed, Sep 17, 2003 at 11:12:36AM +0800, Dmitry Lohansky wrote:
>  
>
>>Hello openssh@
>>
>>I thought about sftp's reget/reput commands.
>>
>>Several days ago, Damien Miller write to tech at openbsd.org (it was
>>reply for my letter):
>>
>>    
>>
>>>Herein lies a problem which is not easy to detect or solve. For
>>>performance reasons, the sftp client does pipelined reads/writes when
>>>transferring files. The protocol spec allows for a server to process
>>>these requests out of order. For example:
>>>      
>>>
>>>client                     server
>>>------                     ------
>>>open file                  your file handle is "blah"
>>>gimme bytes 0-8191
>>>gimme bytes 8192-16383
>>>gimme bytes 16384-24575
>>>gimme bytes 24576-32767    here are bytes 24576-32767
>>>close file                 here are bytes 16384-24575
>>>                           here are bytes 8192-16383
>>>                           here are bytes 0-8191
>>>                           close successful
>>>      
>>>
>>>If the client writes the bytes our in the order they are received (which
>>>it probably should, to avoid buffering large amounts of data) then an
>>>interruption will leave a full-length, but "holey" file on disk. There
>>>is no general way to determine how to do resume such a transfer.
>>>      
>>>
>>>The best the client can do to make transfers resumable is ftruncate()
>>>the file at the highest contiguous byte received. This will stop the
>>>potential corruption on resume.
>>>      
>>>
>>This is good method, but if client crash, we also may get a "hole".
>>What your think about next way?
>>
>>Storing extra-data at the end of file, for example:
>>
>><---orig-part-><-extra->
>>[*][ ][*][ ][*][*******]
>><---------file--------->
>>
>>where [*] - already loaded data, [ ] - not yet
>>
>>In extra part, we may store which block was already loaded and it
>>offset and size. After download, extra part will be removed.
>>
>>Comments?
>>-- 
>> Dmitry Lohansky
>>
>>_______________________________________________
>>openssh-unix-dev mailing list
>>openssh-unix-dev at mindrot.org
>>http://www.mindrot.org/mailman/listinfo/openssh-unix-dev
>>    
>>
>
>_______________________________________________
>openssh-unix-dev mailing list
>openssh-unix-dev at mindrot.org
>http://www.mindrot.org/mailman/listinfo/openssh-unix-dev
>  
>