SSH Compression - Block Deduplication

Sat Sep 10 03:03:23 EST 2011

Hello,

I did a search against the list archive and didn't see any comments on the 
topic of using deduplication as a compression algorithm.  This is just a 
suggestion for the developers to think about.  Block de-duplication is 
found in higher end WAN gear.  The idea is to mantain an indexed 
dictionary of data blocks.  If a matching block is found prior to 
transmission, a token (hash id) is sent in it's place to reduce the amount 
of data transmitted.  Each side of the connection builds and maintains 
this dictionary.  This would be done to data blocks before they are 
encrypted.

Optimal systems go beyond using fixed block sizes and look for variable 
length data block that occur frequently.  For an initial implementation, 
fix block sizes are probably fine.

In the WAN scenario, the dictionaries are retained and built up over time 
to optimize performance.

This may run somewhat contrary to the goals of OpenSSH, namely privacy. 
Retention of data in a memory or disk cache may not be desirable. 
However, certain work loads would stand to benefit from this type of 
compression.

I would suggest a default of per-session only cache/dictionary with a 
switch to enable a persistent disk based cache/dictionary.  Dictionaries 
would probably be maintained seperately on a per host basis.

My hope is that this sort of compression would greatly help remote X11 
display.  Latency and the chatty nature of X11 is still an issue.  But 
this would help with painting of bitmap patterns.

Matt