[PATCH] Improve endian conversion in umac.c

rapier rapier at psc.edu
Wed Mar 9 09:51:51 AEDT 2022


Howdy all,

I was poking at the MAC routines looking for some efficiencies for high 
performance environments. I was looking at the umac.c and comparing it 
to the original source at https://fastcrypto.org/front/umac/umac.c After 
a couple of false starts I found that reverting the endian conversion 
routines back to what Krovetz wrote realized a 8% to 16% improvement in 
throughput with aes256-ctr. For example, 715MB/s vs 856MB/s. I may be 
missing something in using the get/put routines in misc.c in terms of 
portability or security though.

Test bed 1: 2 10Gb connected AMD Epyc 32 core hosts. RTT < .5ms
Test bed 2: Intel Xeon x5675, AMD Ryzen 7 5800X 10Gb connected.
	    RTT < .5ms

I saw more performance gains in test bed 1 than test bed 2 with (16% vs 
8%) but I think the gains are proportional the number of packets. The 
Epycs can push data about twice as fast as the Xeon can.

In 1Gb environments I'm not seeing any benefit in throughput as I can 
max out that path with stock.

Anyway, I wanted to make this patch available for discussion.

Chris

diff --git a/umac.c b/umac.c
index e5ec19f0..d5de3806 100644
--- a/umac.c
+++ b/umac.c
@@ -134,17 +134,33 @@ typedef unsigned int      UWORD;  /* Register */
  /* --- Endian Conversion --- Forcing assembly on some platforms 
    */
  /* 
---------------------------------------------------------------------- */

+static UINT32 LOAD_UINT32_REVERSED(void *ptr)
+{
+    UINT32 temp = *(UINT32 *)ptr;
+    temp = (temp >> 24) | ((temp & 0x00FF0000) >> 8 )
+         | ((temp & 0x0000FF00) << 8 ) | (temp << 24);
+    return (UINT32)temp;
+}
+
+static void STORE_UINT32_REVERSED(void *ptr, UINT32 x)
+{
+    UINT32 i = (UINT32)x;
+    *(UINT32 *)ptr = (i >> 24) | ((i & 0x00FF0000) >> 8 )
+                   | ((i & 0x0000FF00) << 8 ) | (i << 24);
+}
+
+/* The following definitions use the above reversal-primitives to do 
the right
+ * thing on endian specific load and stores.
+ */
+
  #if (__LITTLE_ENDIAN__)
-#define LOAD_UINT32_REVERSED(p)                get_u32(p)
-#define STORE_UINT32_REVERSED(p,v)     put_u32(p,v)
+#define LOAD_UINT32_LITTLE(ptr)     (*(UINT32 *)(ptr))
+#define STORE_UINT32_BIG(ptr,x)     STORE_UINT32_REVERSED(ptr,x)
  #else
-#define LOAD_UINT32_REVERSED(p)                get_u32_le(p)
-#define STORE_UINT32_REVERSED(p,v)     put_u32_le(p,v)
+#define LOAD_UINT32_LITTLE(ptr)     LOAD_UINT32_REVERSED(ptr)
+#define STORE_UINT32_BIG(ptr,x)     (*(UINT32 *)(ptr) = (UINT32)(x))
  #endif

-#define LOAD_UINT32_LITTLE(p)          (get_u32_le(p))
-#define STORE_UINT32_BIG(p,v)          put_u32(p, v)
-
  /* 
---------------------------------------------------------------------- */
  /* 
---------------------------------------------------------------------- */
  /* ----- Begin KDF & PDF Section 
---------------------------------------- */


More information about the openssh-unix-dev mailing list