Support for transferring sparse files via scp/sftp correctly?

Fri Apr 4 15:59:35 AEDT 2025

On Apr 3, 2025, at 6:02 PM, Darren Tucker <dtucker at dtucker.net> wrote:
> On Sat, 29 Mar 2025 at 16:14, Ron Frederick <ronf at timeheart.net <mailto:ronf at timeheart.net>> wrote:
>> [...]
>> If you don’t get all of the requested ranges in a single request, additional requests can be sent starting at just past the end of the last range previously returned.
>> 
>> What do you think?
> 
> That seems like it'd work well for things with SEEK_HOLE or equivalent, although there's always the chance of the underlying file changing between mapping it out and doing the transfer.

Since my last message, I’ve also implemented support for this in Windows, which has a DeviceIOControl called FSCTL_QUERY_ALLOCATED_RANGES that returns an array of offset and length values, within a given range in a file (also specified by offset and length). So, it’s almost a direct mapping to the extension I proposed. I basically have three different versions of a request_ranges() function (Windows, systems with SEEK_DATA/SEEK_HOLE, and a dummy implementation for all other platforms which just returns the full range passed in).

The risk of missing data due to file changes is no different than what could happen if you were reading data sequentially and something did a write to the source file after you had already copied that part of the file.

> Damien pointed out that it's possible to do a reasonable but not perfect sparse file support by memcmp'ing your existing file buffer with a block of zeros and skipping the write if it matches.  OpenBSD's cp(1) does this (look for "skipholes"): https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/cp/utils.c?annotate=HEAD.

Yeah - I’ve thought about implementing something like this as a second pass, where OSes with the ability to return ranges will tell you what parts of the local file not to read at all, but then within the ranges which are reported, it could strip off additional null bytes at the beginning or end of blocks it read, including skipping some block writes entirely if the read returned all nulls. This could also be applied to the entire file on systems with no ability to return ranges.

> This seems surprisingly effective in the case where you already have the file content in a buffer anyway, but it would be harder to do (or at least more expensive) as part of a separate request type that returns the ranges.  It'd be easier to implement if there was some kind of "read-sparse" operation that could return a list of {offset, len, data} instead of just the offsets and lengths.  This would reduce the time between the sparse check and the read although it's still potentially racy.

If the alternative is reading the file in its entirety on a platform that doesn’t support requesting sparse ranges, I agree that doing the read followed by a memory compare and a seek is going to always be better than doing a write. However, if you are doing a get or copy operation where the read has to go over the network, the approach isn’t nearly as effective. Also, even for a put operation where reads are local, it can take a very long time to read and skip large ranges (think gigabyte or larger holes) when ranges are not returned. Some of the use cases I found around sparse files talked about files with a total size in terabytes or even petabytes, but populated very sparsely. You probably wouldn’t want to use the memory compare approach on something that large.
-- 
Ron Frederick
ronf at timeheart.net