SSH Compression - Block Deduplication

Dan Kaminsky dan at doxpara.com
Tue Sep 13 07:09:04 EST 2011


I can't speak for everyone, but a new compression mode with 75% of the efficiency of NoMachine but 10% of the complexity wouldn't scare me, especially since it'd be a post-auth codebase. The thing to do would be to look at, say, 100MB of X traffic and see if there are indeed massive duplicated blocks in the stream.  

Sent from my iPhone

On Sep 12, 2011, at 12:55 PM, Matt Olson <molson at atlantis.oceanconsulting.com> wrote:

> Hi Gert,
> 
> Let me start by saying I'm not an expert in gzip compression internals.
> 
> For others to read along:
> 
> http://www.gzip.org/algorithm.txt
> 
> (RE: LZ77) Distance of 32k and lengths (essentially variable block) of 258 bytes are both quite small when talking about graphics data.  With moderm processors and memory, it would be interesting to see how this performs with distance of 4MB and length of 32k.  Those fit well within modern L2 and L1 caches respectively.
> 
> Of course the actualy distance and length values balance a race between CPU time and network latency.  Example: if it takes 500ms to search the last 4MB for duplicates when your network is 100ms latency, then you really haven't gained anything in apparent speed; you have only conserved bandwidth.
> 
> WAN accelerator deduplication data dictionaries are much larger and can cache patterns found within the entire (or multiple) session(s).
> 
> However, LZ77 with larger distance and length values do have the speed advantage of not having to go to main memory or disk.  I think 4MB/32KB would be useful with X11 and be an interesting test.
> 
> Matt
> 
> 
> On Mon, 12 Sep 2011, Gert Doering wrote:
> 
>> Hi,
>> 
>> On Mon, Sep 12, 2011 at 08:26:41AM -0700, Matt Olson wrote:
>>> I may look around and see if I can find a library that does another layer
>>> of tunneling or a Xorg addon to provide deduplication.
>> 
>> Doesn't gzip compression suit your needs?  This already does fairly
>> thorough deduplication - not on a "per block level" but on a "per byte
>> sequence" level, so much more flexible...
>> 
>> gert
>> -- 
>> USENET is *not* the non-clickable part of WWW!
>>                                                          //www.muc.de/~gert/
>> Gert Doering - Munich, Germany                             gert at greenie.muc.de
>> fax: +49-89-35655025                        gert at net.informatik.tu-muenchen.de
>> 


More information about the openssh-unix-dev mailing list