Support for transferring sparse files via scp/sftp correctly?
Ron Frederick
ronf at timeheart.net
Sat Mar 29 16:06:48 AEDT 2025
Sorry for the mis-send earlier. Here’s the complete message I meant to send.
On Mar 6, 2025, at 9:38 PM, Damien Miller <djm at mindrot.org <mailto:djm at mindrot.org>> wrote:
> If you want this to happen, I recommend starting by figuring out what
> protocol extensions need to be made, and how to support sparse files
> on system without SEEK_DATA/HOLE - it should be pretty to do this on
> upload without these flags and without extensions.
I was inspired by this thread to add sparse file support to AsyncSSH, on OSes that support SEEK_DATA and SEEK_HOLE. It looks like I should also be able to get this to work on Windows with FSCTL_QUERY_ALLOCATED_RANGES and FSCTL_SET_SPARSE, but I haven’t gotten to that yet.
As Darren Tucker said, the put() operation here can be made to work with any SFTP server. However, an SFTP extension is required to support this for get() or copy(), or the case where the copy-data extension is used to copy data between files on a remote server without reading and writing it back over the wire.
I’ve defined an extension called "ranges at asyncssh.com <mailto:ranges at asyncssh.com>” which is modeled somewhat after FXP_READDIR for getting valid data ranges in a remote file. Each call can return multiple ranges, but on files with a large number of ranges you may need send this request multiple times to get the complete list. This allows for the copying to be interleaved with getting back range responses.
The request looks like the following:
uint32 id
string “ranges at asyncssh.com <mailto:ranges at asyncssh.com>”
string handle
uint64 offset
uint64 length
This requests valid data ranges in the file associated with the request handle. The offset and length specify the portion of the file which the ranges should be returned for. The response looks like:
uint32 id
uint32 count
repeats count times:
uint64 offset
uint64 length
bool end-of-list [optional]
The count specifies the number of ranges in the reply. After this is an optional bool which indicates whether there are any more valid data ranges in the request’s offset and length. If there are no entries at all within the request range, an FXP_STATUS of FX_EOF should be sent.
If you don’t get all of the requested ranges in a single request, additional requests can be sent starting at just past the end of the last range previously returned.
What do you think?
--
Ron Frederick
ronf at timeheart.net
More information about the openssh-unix-dev
mailing list