Support for transferring sparse files via scp/sftp correctly?
Lionel Cons
lionelcons1972 at gmail.com
Fri Mar 7 21:49:00 AEDT 2025
On Fri, 7 Mar 2025 at 06:45, Damien Miller <djm at mindrot.org> wrote:
>
> On Wed, 5 Mar 2025, Cedric Blancher wrote:
>
> > On Tue, 4 Mar 2025 at 21:22, Chris Rapier <rapier at psc.edu> wrote:
> > >
> > >
> > >
> > > On 3/4/25 05:34, Philipp Marek via openssh-unix-dev wrote:
> > > >> Does OpenSSH scp/sftp mode transfer sparse files correctly, i.e. are
> > > >> holes skipped and not transferred as chunks of 0 bytes? [1]
> > > >>
> > > >> We're asking about sparse files in the >= 1PB range, which consists of
> > > >> multi-TB holes with around 600-2000GB of valid data.
> > > >
> > > >
> > > > Perhaps rsync would be a good fit here,
> > > > it supports --sparse.
> > > > _______________________________________________
> > > > openssh-unix-dev mailing list
> > > > openssh-unix-dev at mindrot.org
> > > > https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
> > >
> > > I think one of the issues you are going to face is that SEEK_DATA and
> > > SEEK_HOLE don't seem to be currently supported under OpenBSD. Since
> > > that's the home OS for OpenSSH this could create portability issues.
> > > While you can get around that with the judicious use of defines it means
> > > that the feature set will start to shift between different OSes.
> >
> > OpenBSD unfortunately does not implement so many other APIs. But other
> > OS do implement SEEK_DATA+SEEK_HOLE, including FreeBSD, Linux,
> > Solaris, Illumos and even Cygwin. Even NFS has a SEEK to lookup holes
> > and data sections in files.
> > SEEK_HOLE+SEEK_DATA are also now part of the POSIX standard, so IMO it
> > is time to face the bug that sparse files are not handled correctly
> > and fix it
>
> You and the others on this thread are IIRC the first people in sftp's
> 24 year history to ever ask for sparse file support.
This is actually not true. ssh.com's ssh had a patch for sparse file
support, made by the same people at SUN who did the "X11 untrusted
cookie" (ssh -Y) work (Alan Coopersmith, Roland Mainz).
But it never made it past a patch, because it was Solaris-specfic, and
each filesystem vendor on other platforms had their own custom APIs to
lookup data ranges in sparse files. Worse even, some APIs were slow,
because they enumerated ALL hole and data ranges, which is a no-go for
multi-petabyte files these days (SEEK_HOLE works fine, and so does
FSCTL_QUERY_ALLOCATED_RANGES on Windows)
This was 22 years ago.
Usage of the patched ssh.com ssh fell out of use around 8 years ago,
and since then we have a cumbersome and fragile workaround with tar
files as containers for sftp transfers in place. That thing never
really works reliably, and shows up on the management radar as "IT
outage" too often.
22 years later (from the release of the original ssh.com sparse file
patch), we have SEEK_DATA+SEEK_HOLD as established POSIX standard
(https://pubs.opengroup.org/onlinepubs/9799919799/functions/lseek.html),
which is supported on Linux, Solaris, FreeBSD and even Cygwin.
So in my humble opinion it'll be nice to work on sparse file support in OpenSSH.
Lionel
More information about the openssh-unix-dev
mailing list