Parallel transfers with sftp (call for testing / advice)

Sun May 10 00:57:00 AEST 2020

Hi,

Cyril Servant wrote:
> > I think you make a compelling argument. I admit that I haven't
> > reviewed the patch, even though that is what matters the most.
> 
> If you want to review the code, here is a direct link to the patch:
> https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1

Thanks for that, I've pulled your branch and reviewed locally.

I'm sorry, but I have to say that I think this has zero chance of going
upstream as-is, and it will probably be problematic even with further work.
I'll come back to why.

As-is there are way too many things going on in this proposal at once, and it
seems to make some assumptions which do hold true on Linux and probably other
systems but are not as portable as the rest of the codebase, and thus can't
work on some systems where OpenSSH currently works.

My review feedback follows. Note that I'm not an OpenSSH developer, but have
followed the project for a long time. Some feedback is specific, but most is
rather general. Your change viewed in isolation is not terrible, but I hope to
create an understanding for why the proposal is far too different from the
existing code to be able to go upstream, so that you may be able to avoid
similar situations in the future.

* A new thread queue infrastructure

Please forget about using this pattern in the OpenSSH context. Others have
mentioned that threading requires much more care, and while that is true I
think a simple thread queue is doable, although I think there's an
issue or two still overlooked in the proposed implementation (specifically,
threads combined with child processes especially requires carefully dealing
with signals in all threads, which the change doesn't seem to do, and the
thread_loop() logic is not clear - the function will never return).

Anyway, the important point is that a thread queue is utterly different
from the architecture of the entire OpenSSH codebase, it is not nearly as
portable, and introducing such a different paradigm can not be justified
for gaining concurrency in the range of 10-63.

There already exists concurrency infrastructure which will scale fine into
that range, it is based on a select() call. If you do rework, please look
at if/how you can reuse that and do discuss what you find, in particular
discuss beforehand if you find that you would need to extend it.

* Established behavior is changed, an error condition arbitrarily introduced

The proposed change refuses file transfer if the local and remote files
have different sizes. This has never been a requirement before, and should
certainly not become one. It's also not neccessary for what you want to do.
If a user says to transfer a file then the behavior always was and should
continue to be that whatever was there before on the recipient side is simply
overwritten.

* New flags are added to existing functions and new versions of some functions

If you ever find yourself creating a function called "real_$otherfunction"
you can be sure that there exists another, better approach.

The new flags and parameters are introduced on very many layers throughout the
code, in order to propagate higher level state down to the lowest level.
Sometimes that is unavoidable, but really do look hard for a simpler approach
first.

You also introduce new global state. Maybe it would be possible to combine
the two.

Ideally you'll find a way to change your code such that you do not actually
need to pass any new state so far down in a call stack. Try hard to confine
any changes within as narrow a call stack window as possible, ideally only
a single caller+callee level for each change. Does that make sense?

The existing concurrency infrastructure already solves this problem btw.

* Special casing is added for the new functionality

In a couple of places new code is added if the new functionality is
used, and the previous code is kept verbatim if not. This approach is
untenable in the long run, and indicates that the existing code would
need to change to also support the new functionality, or sometimes
even better: that the new code needs to go somewhere else, in to another
layer in the call stack, so that existing code and logic doesn't need to
change.

* Don't add server workarounds to the client

fake_opendir(), reverse_recurse_stat() and wait_availability() do not
seem to belong in the client. The comment about fake_opendir() says:

+ * Open remote directory in order to refresh the file system cache. This is
+ * useful in the case of a parallel upload to a distributed file system (such as
+ * NFS or Lustre)

To me this needs to go into the server; it is the server's responsibility
to present fresh and correct information about the files it has. I don't
believe that this was done for that purpose, but it does look a bit like
this workaround was put in the client to be able to claim that the server
component doesn't need to change, when apparently it does. :\

Personally I think this is rather pragmatism, and sparing yourself the
wait for a changed server to be in operation on all your relevant nodes.

* Ad-hoc name resolution and random server address choice

The sftp client is changed to always call getaddrinfo() on the server name
and pick a random address out of the result. This was not the case before,
otherwise the change would not have been necessary. Server name resolution
is another thing that is already done consistently in a particular way within
the codebase, but the proposed change ignores that and adds a one-off thing
to one specific code path. That is also untenable for the codebase in the
long run; a different approach is absolutely neccessary.

* The thread queue uses "orders" while the existing codebase uses callbacks

This is again about reusing existing patterns in a codebase. Introducing
a completely new and different pattern is sometimes unavoidable, but then
it may make more sense to first refactor the existing codebase, to enable
the addition of some new functionality which would be made possible only
by the new pattern. I don't think this is the case here, however.

* Memory allocations without explanations and without obvious frees

In a few places it looks like new memory allocations are added without
corresponding frees, and there is no explanation e.g. in the commit message.
That makes me worry that leaks are introduced, which would be unfortunate,
since there are probably some folks with long-running client processes.
And the commit that claims to correct memory leaks is also not explained
and in fact looks like it could actually create another memory leak.

* Tests. Nice! :)

* The "optimized sparse file creation" is not obvious and not explained.

* scp ends up being affected by what is proposed as an sftp change.

This is another indicator that something may be changing in places where
it shouldn't.

* Form.

This is the least significant problem, but it does still matter.
There is extra whitespace here and there, and there are some unneccessary
casts, e.g. in free() calls.

Please don't understand the above points as some kind of checklist of things
to fix, as I said that's not my mandate, I just want to contribute perspective
because I think that your use case is valid, and I appreciate that you want to
contribute to OpenSSH.

Now why this solution may be problematic even with further work:

Given that at least part of the significant mismatch between proposed code
and existing code is due to the nature of the problem I think it may be a
bit tricky to find a solution contained completely within OpenSSH, at least
right away.

With that said, I still think that it makes sense to somehow allow OpenSSH
to work in CPU-bound situations such as yours. The question is how..

> > likely to introduce a whole load of not very clean error conditions
> > regarding reassembly, which need to be handled sensibly both within
> > the sftp client and on the interface to outside/calling processes.
> > Can you or Cyril say something about this?
> 
> Indeed, reassembly must be handled properly.
..
I only understood in later emails that you actually don't handle reassembly
within OpenSSH, but offload that onto the kernel by using sparse files. I
think that's a wise choice.

But it obviously only works given kernel and filesystem support, I don't
know if it's OK to require that. And what happens without that support?

Another approach would be for the recipient to initially create the full
size file before any data transfer, but that comes at the cost of a probably
quite significant delay. (Maybe not though? Can you test?)

More importantly, this is significantly different from the current behavior;
a file transfer thus far creates or truncates the file on the recipient side
and appends received data, making the file grow, allowing e.g. humans to know
if the transfer was complete by only comparing sizes.

And there we've come full circle, arriving at transfering only part of a file
with sftp, the reassembly problem, and the topic of other tooling already
reliably handling things such as partial transfers, reassembly, and more...

> > And another thought - if the proposed patch and/or method indeed will not
> > go anywhere, would it still be helpful for you if the sftp client would
> > only expose the file offset functionality? That way, the complexity of
> > reassembly and the associated error handling doesn't enter into OpenSSH.
> 
> There is no server side change in this patch, so I don't think we can talk
> about conplexity of reassembly.

Indeed the complexity I had in mind isn't present in your proposal, but OTOH
there is plenty of other complexity that I didn't anticipate. Heh.

And while there is no server change, I think there probably should be,
instead of adding that fake_opendir() business to the client.

> Of course, exposing the file offset functionality would help creating a new
> software on top of the sftp client, but at the cost of simplicity.

Your proposed change is also all but simple.

I think an approach that requires only a very small change has a much better
chance to get upstream, but it obviously also depends on what the patch
looks like. I would encourage you to give it a try however. I guess it
should not need to be a very large patch.

> And if you do this, in the case of "get", the new software will need to
> send a lot of "ls" commands in order to be aware of the directory structure,
> the file sizes in order to correctly split transfers

Right, recursive get could probably benefit from further support from the sftp
client, but start simple; what would the simple case look like, where a user
instructs the sftp program to transfer only a part of a file?

Thanks and kind regards

//Peter