SSH Compression - Block Deduplication

Tue Sep 13 05:55:30 EST 2011

Hi Gert,

Let me start by saying I'm not an expert in gzip compression internals.

For others to read along:

http://www.gzip.org/algorithm.txt

(RE: LZ77) Distance of 32k and lengths (essentially variable block) of 258 
bytes are both quite small when talking about graphics data.  With moderm 
processors and memory, it would be interesting to see how this performs 
with distance of 4MB and length of 32k.  Those fit well within modern L2 
and L1 caches respectively.

Of course the actualy distance and length values balance a race between 
CPU time and network latency.  Example: if it takes 500ms to 
search the last 4MB for duplicates when your network is 100ms latency, 
then you really haven't gained anything in apparent speed; you have 
only conserved bandwidth.

WAN accelerator deduplication data dictionaries are much larger and can 
cache patterns found within the entire (or multiple) session(s).

However, LZ77 with larger distance and length values do have the speed 
advantage of not having to go to main memory or disk.  I think 4MB/32KB 
would be useful with X11 and be an interesting test.

Matt

On Mon, 12 Sep 2011, Gert Doering wrote:

> Hi,
>
> On Mon, Sep 12, 2011 at 08:26:41AM -0700, Matt Olson wrote:
>> I may look around and see if I can find a library that does another layer
>> of tunneling or a Xorg addon to provide deduplication.
>
> Doesn't gzip compression suit your needs?  This already does fairly
> thorough deduplication - not on a "per block level" but on a "per byte
> sequence" level, so much more flexible...
>
> gert
> -- 
> USENET is *not* the non-clickable part of WWW!
>                                                           //www.muc.de/~gert/
> Gert Doering - Munich, Germany                             gert at greenie.muc.de
> fax: +49-89-35655025                        gert at net.informatik.tu-muenchen.de
>