SSH Compression - Block Deduplication
Matt Olson
molson at atlantis.oceanconsulting.com
Sat Sep 10 03:03:23 EST 2011
Hello,
I did a search against the list archive and didn't see any comments on the
topic of using deduplication as a compression algorithm. This is just a
suggestion for the developers to think about. Block de-duplication is
found in higher end WAN gear. The idea is to mantain an indexed
dictionary of data blocks. If a matching block is found prior to
transmission, a token (hash id) is sent in it's place to reduce the amount
of data transmitted. Each side of the connection builds and maintains
this dictionary. This would be done to data blocks before they are
encrypted.
Optimal systems go beyond using fixed block sizes and look for variable
length data block that occur frequently. For an initial implementation,
fix block sizes are probably fine.
In the WAN scenario, the dictionaries are retained and built up over time
to optimize performance.
This may run somewhat contrary to the goals of OpenSSH, namely privacy.
Retention of data in a memory or disk cache may not be desirable.
However, certain work loads would stand to benefit from this type of
compression.
I would suggest a default of per-session only cache/dictionary with a
switch to enable a persistent disk based cache/dictionary. Dictionaries
would probably be maintained seperately on a per host basis.
My hope is that this sort of compression would greatly help remote X11
display. Latency and the chatty nature of X11 is still an issue. But
this would help with painting of bitmap patterns.
Matt
More information about the openssh-unix-dev
mailing list