sftp reget/reput

Thu Sep 18 00:48:34 EST 2003

I would love to see a modification to the SSH protocol to support the
ability for the client to ask the sever for a checksum without getting the
physical data.

It would be useful in reget, and would be useful for people building more
advanced features to detect file changes and download only the blocks that
need updating.

I've thought of it off and on, but never sat down and implemented and
wrote up a document on it.

- Ben

On Wed, 17 Sep 2003, Markus Friedl wrote:

> we could modify the protocol and implement
> rolling checksums like Niels Provos suggests:
>
> 	MD5_CTX ctx1, ctx2;
>
> 	MD5_Init(&ctx1);
>
> 	while new page in data
> 	  MD5_Update(&ctx1, newpage, pagesize)
> 	  ctx2 = ctx1;
> 	  MD5_Final(digest, &ctx2)
> 	  if (compare with remote not equal)
> 	     break;
> 	end while
>
> 	continue data transfer.
>
> On Wed, Sep 17, 2003 at 11:12:36AM +0800, Dmitry Lohansky wrote:
> > Hello openssh@
> >
> > I thought about sftp's reget/reput commands.
> >
> > Several days ago, Damien Miller write to tech at openbsd.org (it was
> > reply for my letter):
> >
> > > Herein lies a problem which is not easy to detect or solve. For
> > > performance reasons, the sftp client does pipelined reads/writes when
> > > transferring files. The protocol spec allows for a server to process
> > > these requests out of order. For example:
> >
> > > client                     server
> > > ------                     ------
> > > open file                  your file handle is "blah"
> > > gimme bytes 0-8191
> > > gimme bytes 8192-16383
> > > gimme bytes 16384-24575
> > > gimme bytes 24576-32767    here are bytes 24576-32767
> > > close file                 here are bytes 16384-24575
> > >                            here are bytes 8192-16383
> > >                            here are bytes 0-8191
> > >                            close successful
> >
> > > If the client writes the bytes our in the order they are received (which
> > > it probably should, to avoid buffering large amounts of data) then an
> > > interruption will leave a full-length, but "holey" file on disk. There
> > > is no general way to determine how to do resume such a transfer.
> >
> > > The best the client can do to make transfers resumable is ftruncate()
> > > the file at the highest contiguous byte received. This will stop the
> > > potential corruption on resume.
> >
> > This is good method, but if client crash, we also may get a "hole".
> > What your think about next way?
> >
> > Storing extra-data at the end of file, for example:
> >
> > <---orig-part-><-extra->
> > [*][ ][*][ ][*][*******]
> > <---------file--------->
> >
> > where [*] - already loaded data, [ ] - not yet
> >
> > In extra part, we may store which block was already loaded and it
> > offset and size. After download, extra part will be removed.
> >
> > Comments?
> > --
> >  Dmitry Lohansky
> >
> > _______________________________________________
> > openssh-unix-dev mailing list
> > openssh-unix-dev at mindrot.org
> > http://www.mindrot.org/mailman/listinfo/openssh-unix-dev
>
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> http://www.mindrot.org/mailman/listinfo/openssh-unix-dev
>