Connection caching?

Sun May 2 17:41:37 EST 2004

Hey all,
on the distcc mailing list, a thread about load balancing
got a bit out of hand, and we started thinking about
moving fsh-like connection caching into ssh itself
to get rid of the overhead of starting up the python
interpreter to run rsh.
(Interestingly, mit's "rex", described at
http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-884.pdf,
considers connection caching one of the advantages it has over ssh.)

Here are a few ideas, not quite boiled down to a proposal yet.
You'd run a local agent (maybe not
ssh-agent, since that deals with keys and wants to stay svelte,
but in any case something that is started just like ssh-agent
and leaves its socket name and pid in the environment)
which could keep connections open for a while to recently
accessed machines so that new sessions could be opened instantly
instead of requiring new cryptographic handshaking.
(I suppose perhaps you could cache the result of the handshake
rather than caching the actual connection, kind of like ssl does,
but that doesn't sound like the ssh way of doing things.)
If ssh noticed the connection cache was there (i.e. its variable
is set in the environment), it would tell the cache where it
wanted to connect to, and the cache would pass back an already-
connected fd ready to go.  The tricky part is, how would the
client pass back the fd when it's finished with the command?
Simplest would be for the client to just close the connection.
Then the connection cache would be more like a connection prefetcher;
it would have to start connections before being asked for them.
Alternately, if all traffic was actually sent via the local
agent, it could just keep the same TCP connection open to the
remote host no matter how many streams were active, and just
multiplex them all.  That means one extra copy of the data,
but it does get rid of the need for any psychic powers on the
part of the local agent.  And (bonus!) it kind of means that
all the smarts are in the local agent, and apps can just
talk directly to it instead of forking an ssh (or loading an ssh library).

It might also be easy for this agent to do simple load balancing,
i.e. if the hostname given is the name of a cluster of ssh
servers rather than a real server, it should give the command
to the least loaded of the servers.  That would come in handy
for distcc, and would keep people from trying to use distcc
for a general purpose job distributor :-)

OK, glad I got that off my chest.  Maybe if I sleep on it,
I'll realize which way to go with it.
- Dan