SSH Compression - Block Deduplication

Dan Kaminsky dan at doxpara.com
Sat Sep 10 08:56:58 EST 2011


Deduplication is a fairly domain specific technology -- it works well for
files that are shared across multiple users (a context that's about as
distant from SSH as you can imagine) and for a few protocols that move large
blocks of data repeatedly (X11).  But even in the latter case, efficient
protocols like NoMachine's do much better, as they're deeply protocol aware.
 SSH is most likely the wrong layer for this work -- in general, gzip
captures most of the compression gain, and in the specific cases where it
doesn't, the work is too specific to be done generically (thus special
accelerators for each protocol in WAN gear).

On Fri, Sep 9, 2011 at 10:03 AM, Matt Olson <
molson at atlantis.oceanconsulting.com> wrote:

> Hello,
>
> I did a search against the list archive and didn't see any comments on the
> topic of using deduplication as a compression algorithm.  This is just a
> suggestion for the developers to think about.  Block de-duplication is found
> in higher end WAN gear.  The idea is to mantain an indexed dictionary of
> data blocks.  If a matching block is found prior to transmission, a token
> (hash id) is sent in it's place to reduce the amount of data transmitted.
>  Each side of the connection builds and maintains this dictionary.  This
> would be done to data blocks before they are encrypted.
>
> Optimal systems go beyond using fixed block sizes and look for variable
> length data block that occur frequently.  For an initial implementation, fix
> block sizes are probably fine.
>
> In the WAN scenario, the dictionaries are retained and built up over time
> to optimize performance.
>
> This may run somewhat contrary to the goals of OpenSSH, namely privacy.
> Retention of data in a memory or disk cache may not be desirable. However,
> certain work loads would stand to benefit from this type of compression.
>
> I would suggest a default of per-session only cache/dictionary with a
> switch to enable a persistent disk based cache/dictionary.  Dictionaries
> would probably be maintained seperately on a per host basis.
>
> My hope is that this sort of compression would greatly help remote X11
> display.  Latency and the chatty nature of X11 is still an issue.  But this
> would help with painting of bitmap patterns.
>
> Matt
>
>
>
> ______________________________**_________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/**mailman/listinfo/openssh-unix-**dev<https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev>
>


More information about the openssh-unix-dev mailing list