Support for transferring sparse files via scp/sftp correctly?
Darren Tucker
dtucker at dtucker.net
Wed Mar 5 21:14:00 AEDT 2025
On Tue, Mar 04, 2025 at 02:43:10PM +0100, Lionel Cons wrote:
[...]
> Really: Built In sparse file support, which is on by default, makes
> more sense, as we do not have to maintain&update&administer lots of
> tools just to get the job done. It's also less error-prone.
>
> FYI Sparse files are nothing new or magic, they have been around since
> the dawn of filesystems, and even WinXP&WinServer2000 have sparse file
> support.
I wasn't aware that the SEEK_HOLE and SEEK_DATA had even been
standardised, although it looks like that was only some time last year.
As others have noted it's still not universally available.
Having looked at it:
- support in scp-the-protocol support is a non-starter (along with
pretty much every other proposed extension thereof).
- sftp (including the now-default sftp mode in scp(1)) seems like it's
possible for client "put", since the client can glean sufficient
information via the new lseek interfaces, and the sftp protocol is
sufficiently flexible to implement it. See example patch below.
- I don't see sufficient information available in the sftp protocol
from the server to the client to support it for client "get".
Certainly the secsh-filefxer-02[0] (ie v3) version that OpenSSH
implements doesn't, but even the most recent -13 drafts only seem
to support only a per-file boolean that indicates if it's on or off.
I don't see a way for a client to determine the location and/or size of
any holes in a remote file in order to replicate them on a downloaded
file. The only way I can see it could be supported is by adding a
vendor extension (which would need to be supported by both client
and server) that could supply the information about holes/extents,
which would be a larger undertaking.
[0] https://datatracker.ietf.org/doc/html/draft-ietf-secsh-filexfer-02
diff --git a/sftp-client.c b/sftp-client.c
index be40d2097..97f7cc26c 100644
--- a/sftp-client.c
+++ b/sftp-client.c
@@ -2028,6 +2028,32 @@ sftp_download_dir(struct sftp_conn *conn, const char *src, const char *dst,
return ret;
}
+/*
+ * Check a potentially-sparse file for location of holes and data, starting
+ * from "offset". If the next hole points to EOF, there are no remaining holes.
+ */
+static void
+sftp_check_sparse_file(int fd, off_t offset, off_t *data_offset,
+ off_t *hole_offset)
+{
+#if defined(SEEK_HOLE) && defined(SEEK_DATA)
+ if ((*hole_offset = lseek(fd, offset, SEEK_HOLE)) == -1)
+ fatal_f("lseek(SEEK_HOLE): %s", strerror(errno));
+ if ((*data_offset = lseek(fd, offset, SEEK_DATA)) == -1)
+ fatal_f("lseek(SEEK_DATA): %s", strerror(errno));
+#else
+ /* No sparse file support, assume data spans start to end. */
+ *data_offset = offset;
+ if ((*hole_offset = lseek(fd, offset, SEEK_END)) == -1)
+ fatal_f("lseek(SEEK_SET): %s", strerror(errno));
+#endif
+ if (lseek(fd, offset, SEEK_SET) == -1) /* restore cursor */
+ fatal_f("lseek(SEEK_SET): %s", strerror(errno));
+ debug3_f("offset %llu data_offset %llu hole_offset %llu",
+ (unsigned long long)offset, (unsigned long long)*data_offset,
+ (unsigned long long)*hole_offset);
+}
+
int
sftp_upload(struct sftp_conn *conn, const char *local_path,
const char *remote_path, int preserve_flag, int resume,
@@ -2035,7 +2061,7 @@ sftp_upload(struct sftp_conn *conn, const char *local_path,
{
int r, local_fd;
u_int openmode, id, status = SSH2_FX_OK, reordered = 0;
- off_t offset, progress_counter;
+ off_t offset, data_offset = 0, hole_offset = 0, progress_counter;
u_char type, *handle, *data;
struct sshbuf *msg;
struct stat sb;
@@ -2044,7 +2070,7 @@ sftp_upload(struct sftp_conn *conn, const char *local_path,
u_int64_t highwater = 0, maxack = 0;
struct request *ack = NULL;
struct requests acks;
- size_t handle_len;
+ size_t handle_len, maxread = SIZE_T_MAX;
debug2_f("upload local \"%s\" to remote \"%s\"",
local_path, remote_path);
@@ -2122,6 +2148,43 @@ sftp_upload(struct sftp_conn *conn, const char *local_path,
for (;;) {
int len;
+ if (hole_offset < sb.st_size) {
+ sftp_check_sparse_file(local_fd, offset, &data_offset,
+ &hole_offset);
+
+ /*
+ * If source is sparse, truncate destination at first
+ * hole, since we won't overwrite those bytes.
+ */
+ if (offset == 0 && hole_offset != sb.st_size) {
+ debug_f("Sparse file with first hole at %llu, "
+ "truncating destination at %llu",
+ (unsigned long long)hole_offset,
+ (unsigned long long)hole_offset);
+ attrib_clear(&t);
+ t.flags = SSH2_FILEXFER_ATTR_SIZE;
+ t.size = hole_offset;
+ sftp_fsetstat(conn, handle, handle_len, &t);
+ }
+ if (data_offset > hole_offset) {
+ /* We are in a hole. */
+ debug3_f("Sparse file, hole at %llu, %llu bytes",
+ (unsigned long long)offset,
+ (unsigned long long)data_offset);
+ offset = data_offset;
+ maxread = SIZE_T_MAX;
+ continue;
+ }
+ /*
+ * We are in a data section. Read up to the end of the
+ * data section.
+ */
+ maxread = hole_offset - offset;
+ debug_f("Sparse file, data at %llu, %llu bytes",
+ (unsigned long long)hole_offset,
+ (unsigned long long)maxread);
+ }
+
/*
* Can't use atomicio here because it returns 0 on EOF,
* thus losing the last block of the file.
@@ -2131,7 +2194,7 @@ sftp_upload(struct sftp_conn *conn, const char *local_path,
if (interrupted || status != SSH2_FX_OK)
len = 0;
else do
- len = read(local_fd, data, conn->upload_buflen);
+ len = read(local_fd, data, MIN(conn->upload_buflen, maxread));
while ((len == -1) &&
(errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK));
--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
More information about the openssh-unix-dev
mailing list