Optional 'test' or benchmark cipher

Sat Jan 19 08:46:53 EST 2008

Chris Rapier wrote:
> 
> 
> Linda Walsh wrote:
> 
>>     BDP?  ...
> 
> Bandwidth delay product. Since network transfers aren't instantaneous 
> there is data in transit between two hosts at any time. Any path has a 
> maximum amount of this outstanding data it can hold which is determined 
> by multiplying the bandwidth by the round trip time. If you can keep as 
> much data in transit as the carrying capacity of the path you end up 
> using the network much more efficiently. So if you have a patch with a 
> 2MB BDP and you only have 64KB of data in transit at any time you are 
> only using 3% of the network capacity.
---
	Ahhh.  I understand the concept.  I've just approached it
in a different way, since from an application point of view, the
"delay" (in my setup) is mostly due to OS and network-stack (OSI
model) inefficiencies.  My last "round" of network communication
"analysis" (limited to my internal network) showed that my worst
delays came from WinXP's network stack.  With 100Mb/s, I was
getting close to the limits of the media (80% saturation, which
I didn't think was "unreasonable" given off-the-shelf networking
stacks.

	When I upgraded my internal paths to 1G, full-duplex,
switched router, I went so far (for testing purposes, to
eliminate all the switches (all 3 of them) between my Win-client
and my Samba-server.  No different in throughput. I made sure
all of my switches and the network cards under test had the
jumbo-frames feature.  That boosted throughput significantly, but
unfortunately, unlike dynamic window-sizing, MTU tests aren't
stored between computers but are stored per-segment.

	I sorta think it is a 'bug' but using a larger
MTU, I noticed things appeared to work fine on interactive
use (small packets), but any bulk transfers that
used the larger packet sizes were **SILENTLY** truncated,
so I couldn't expect dynamic MTU adjustment per path.

	"If only" besides MTU /interface, MTU could
be stored / host, but there was no way to easily (quickly)
compute MTU size "on the fly".  Even if there were, there
was no place to store it.  I think I came to the conclusion
that a separate "MTU-database" (much like the routing
database) would need to be constructed with the end user
needing to accurately find out the jumbo size each comptuter
supported.  Got real messy.

	I also "hoped" that the two faster machines might have
large parts of their latency reduced by running on faster
machines, but not too helpful in my scenario.  I did make
sure my TCP-window sizes were setup to their max supported
(at the time, think it was 256K).  Still didn't help since
the rtt between the fastest computers was sufficiently low
as to not allow multiple segments to be outstanding at
Gigabit speeds
between physical interfaces (I did try using different logical
interfaces between paths, but was defeated when the logical
merged into the common physical.  Not all of my machines could
support another network card that supported jumbo frames (and
vendors didn't agree on the size of "jumbo" (sigh).

	Unfortunately, it would have required some reworking
of the TCP/IP protocol in a non-backwards compatible method, to
support different jumbo packet sizes based on the final target.

	When I updated some of my HW to two Dell-690 workstations
(at the time still running I'd hoped to up the packet size or
hoped for some speed improvements, but the built-in GB controllers
in Dell's workstations didn't support jumbo packets, and the RTT
of 1500-byte packets didn't allow for multiple packets to be
outstanding (as the transceivers were saturated).  The end-to-end
latency increased proportional to greater top-level application
delay (which was why ssh's delay was more important that raw
bandwidth).  Oddly (side note), adding back the two switches
between the fastest machines had no measurable effect on the
ping end-to-end minimum latency.

	Thus my desire to "decrease" end-to-end latency at
the application level and my desire to try a null-cipher in ssh,
as I already knew the application was the limiting factor (compared
to  CIFS) it was half as slow or so).  Increasing ssh's outstanding
packets might help the app-to-app latency, but I wanted to bench
the null-cipher, figuring that would reduce my ssh-to-ssh
latency to the minimum possible (removing cipher, only thing
left is overhead of ssh's "packetizing" of the data.

> In multiplexed applications like SSH its necessary to include an 
> application layer receive window. 
---
	yup -- which was why I wanted to bench the apps throughput
sans the compute-heavy encrypting -- giving me an upper bound.

> This rides on top of the underlying 
> TCP receive window and your effective receive window for the application 
> will be the minimum of the two. SSH, up until 4.7, had a 64KB receive 
> window. Its been boosted to ~1MB and that really helps a lot of people.
---
	Should help long-haul, fast connections with higher latencies.

>>     One of the machines is pegged: an aging 2x1GHz PIII.  It's hard to
>> say what is happening where, since I'm working with 3 different 
>> machine types
>> (4 if you count 1 running WinXP vs. other running Linux).
> 
> Yeah, thats a bottleneck right there. None cipher will do a lot for you.
---
	Hoped there would be some cipher that would run faster, but that
wasn't my finding.  At least I could "highlight" the effect on speed
of the slower CPU vs. same network conditions for faster CPU machines
without adding in constant (not related to path length or latency)
delay due to crypto-calculations.

>>     The file transfer speed between via scp to a a cpu-limited
>> target (ish, below) is 10.4MB.  The same file transfered over CIFS,
>> a file-system protocol, runs at 28.7MB/s.  Network tuning isn't the
>> issue, though "end-to-end" latency caused by ssh may be.  Someone
> 
> It may be but the amount of latency imposed tends to be small. I've not 
> taken the time to quantify this though.
---
	That's where I was headed.  SMB gave me the fastest performance
of app-to-app performance.

>>     Haven't found a source for Iperf yet.  But I get nearly
>> 2x performance over SSH with just using SMB.  I doubt disk is playing
>> much of a role (note, that the 250MB file I'm xfering fits in the
>> buffer cache of 3 out of 4 of the machines).
> 
> http://dast.nlanr.net/Projects/Iperf
---
	found it -- wouldn't "config&make" on my windows machine.  Even
so, the latency indicated by ping for 8K packets seemed to be the l
imiting factor, since the ping-time to the lower-cpu-power, "ish"
was 10-15% faster, indicating the app-to-app delay was more related to
higher-OSI level latencies ( including the application perf).
> 
>>     Was preferring to have the standard ssh support it.  Obviously,
>> I have the sources and can hack something with-or-without a patch, but
>> I don't want to get into modifying or doing a "special compile" for yet
>> another piece of software (have done it with a few, and eventually I tend
>> to give up my 'special mods' over simplicity). 
> 
> In the HPN-SSH code the client must use both
> -oNoneEnabled=yes and -oNoneSwitch=yes in order to use the None switch. 
> NoneEnabled can be set the ssh_config but NoneSwitch must come on the 
> command line. When it enters the NONE cipher it spits out a warning to 
> stderr so users are aware that the switch happened.
---
	And that wouldn't be enough to protect users from any
increased security risk (would still want host-specific enabling
on host -- to only allow such connections "internally", or to
specific external hosts.

> 
> While I would like to see the None switch in the base code I'm not 
> entirely sure its going to happen.
----
	Well, that's an annoyance -- Other than user-protection,
what other problems would there be with that feature?  Seems like
protections are almost all in place to prevent accidental security
flaws.  Are their other technical reasons, or are the remaining
reasons based on politics or people's personal egos.  While I
would think the first could be solved, the latter two aren't usually
worth the effort to fight -- though making it clear that it isn't
a technical reason and is someone's personal preference isn't always
the easiest to ascertain.  :~!

>> What is "HPN"...I don't recognize it as a cipher.
> 
> Its not a cipher. It stands for High Performance Networking and is a set 
> of modifications to improve SSH performance. Its typically geared 
> towards fast networks but it does have some benefits for people in your 
> situation. More ifnormation can be found at
> http://www.psc.edu/networking/projects/hpn-ssh
---
	Ahhh...will have to check it out.

> Currently its being used by NASA, several governmental organizations, 
> its part of HP-UX, several linux distros, its available though FreeBSD, 
> many supercomputing centers use it, a bunch of financial institutions, 
> tech companies, and so forth.
---
	Sure seems like it would be useful to add to the standard product
since HPN-requiring sites only start with government entities, but
eventually trickle down into the civilian sector.

>> As for "production"...
>> if I wanted this at a company, I'd tend to believe I was "insane".
>> "Deliberately disabling encryption at the server and client in a
>> production environment"? 
> 
> A lot of widely used applications do not encrypt bulk data transfers 
> even if they use encryption for authentication. So I'm not sure I would 
> say it was insane, just a different approach.
----
	I was just thinking of some case where their might have
been some "accident" where something important got transfered
unencrypted -- and higher ups would look for a scapegoat (ignoring
the fact that it is perfectly safe under the conditions for which
it was intended).  All too often managers and other
"performance-averaging" types that don't care if it is safe in
the "designed" uses...they just want to blame someone who is doing
anything out of the norm.  Not that I'd have any experience with
such "Dilbertian" management structures. :~/

	Will put the HPN on my list -- though it sounds like
rebuilding all the clients/servers to 4.7 might also be a good
first step just to check if app-to-app latency is helped by the
possibility of larger window sizes.

Thanks...