Support for transferring sparse files via scp/sftp correctly?

Ron Frederick ronf at timeheart.net
Sat Apr 5 10:42:03 AEDT 2025


Hi Lionel,

On Apr 4, 2025, at 2:59 PM, Lionel Cons <lionelcons1972 at gmail.com> wrote:
>> Damien pointed out that it's possible to do a reasonable but not perfect sparse file support by memcmp'ing your existing file buffer with a block of zeros and skipping the write if it matches.  OpenBSD's cp(1) does this (look for "skipholes"): https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/cp/utils.c?annotate=HEAD.
> 
> This should not be done. Either a system has SEEK_DATA/SEEK_HOLE,
> Win32 (Windows&ReactOS) FSCTL_QUERY_ALLOCATED_RANGES, or just copy all
> bytes.
> 
> The misunderstanding is that sequences of 0x00 bytes are automatically
> holes. That is not true. Holes represent ranges of "no data", and only
> for backwards compatibility read as 0x00 bytes. Valid data ranges can
> contain long sequences of 0x00 bytes, therefore PLEASE don't invent
> extra holes in sparse files just because they are sequences of 0x00
> bytes.


My current implementation matches what you describe, copying all the ranges marked as containing data regardless of the content. However, I am curious about what the concern would be. From a pure data reading perspective, the data should be identical and the “extra” holes which get created would allow the file to take up less space. Are you saying there are applications that actually make decisions based on the returned ranges from FSCTL_QUERY_ALLOCATED_RANGES (or SEEK_DATA/SEEK_HOLE) and behave differently on “no data” vs. null bytes? How would such code deal with the fact that the filesystem sometimes allocates a data range larger than the requested range of a write?

I currently have an argument for whether a copy will be sparse or not. I could imagine a separate argument to control this null-matching behavior (probably defaulting to off). Would that address your concern?

I don’t know if this extra processing is really worth the trouble, but there are some cases where it might be valuable given the way sparse file range allocation works, at least on macOS. In experiments I’ve run, data ranges on macOS can be as small at 16 KB, but if you write to two different ranges within a 16 _megabyte_ region of a file, macOS will allocate a single data range that covers both of the ranges actually written plus all of the bytes in between them (showing up now as one big range with the middle filled with null bytes). It could be argued that this region between the two ranges is a “false” data range that really should have remained a hole. Code looking for null bytes could avoid having to read and forward potentially tens of megabytes in each of these “false" data ranges out over SFTP. I haven’t looked closely at Windows to see if it has a similar behavior, but I did see that its allocated data ranges tend to be at least 64 KB in size even if the actual writes are smaller than that. That’s not as bad as the macOS case, but there could be some savings there.
-- 
Ron Frederick
ronf at timeheart.net





More information about the openssh-unix-dev mailing list