[PATCH] Improve endian conversion in umac.c
rapier
rapier at psc.edu
Wed Mar 9 09:51:51 AEDT 2022
Howdy all,
I was poking at the MAC routines looking for some efficiencies for high
performance environments. I was looking at the umac.c and comparing it
to the original source at https://fastcrypto.org/front/umac/umac.c After
a couple of false starts I found that reverting the endian conversion
routines back to what Krovetz wrote realized a 8% to 16% improvement in
throughput with aes256-ctr. For example, 715MB/s vs 856MB/s. I may be
missing something in using the get/put routines in misc.c in terms of
portability or security though.
Test bed 1: 2 10Gb connected AMD Epyc 32 core hosts. RTT < .5ms
Test bed 2: Intel Xeon x5675, AMD Ryzen 7 5800X 10Gb connected.
RTT < .5ms
I saw more performance gains in test bed 1 than test bed 2 with (16% vs
8%) but I think the gains are proportional the number of packets. The
Epycs can push data about twice as fast as the Xeon can.
In 1Gb environments I'm not seeing any benefit in throughput as I can
max out that path with stock.
Anyway, I wanted to make this patch available for discussion.
Chris
diff --git a/umac.c b/umac.c
index e5ec19f0..d5de3806 100644
--- a/umac.c
+++ b/umac.c
@@ -134,17 +134,33 @@ typedef unsigned int UWORD; /* Register */
/* --- Endian Conversion --- Forcing assembly on some platforms
*/
/*
---------------------------------------------------------------------- */
+static UINT32 LOAD_UINT32_REVERSED(void *ptr)
+{
+ UINT32 temp = *(UINT32 *)ptr;
+ temp = (temp >> 24) | ((temp & 0x00FF0000) >> 8 )
+ | ((temp & 0x0000FF00) << 8 ) | (temp << 24);
+ return (UINT32)temp;
+}
+
+static void STORE_UINT32_REVERSED(void *ptr, UINT32 x)
+{
+ UINT32 i = (UINT32)x;
+ *(UINT32 *)ptr = (i >> 24) | ((i & 0x00FF0000) >> 8 )
+ | ((i & 0x0000FF00) << 8 ) | (i << 24);
+}
+
+/* The following definitions use the above reversal-primitives to do
the right
+ * thing on endian specific load and stores.
+ */
+
#if (__LITTLE_ENDIAN__)
-#define LOAD_UINT32_REVERSED(p) get_u32(p)
-#define STORE_UINT32_REVERSED(p,v) put_u32(p,v)
+#define LOAD_UINT32_LITTLE(ptr) (*(UINT32 *)(ptr))
+#define STORE_UINT32_BIG(ptr,x) STORE_UINT32_REVERSED(ptr,x)
#else
-#define LOAD_UINT32_REVERSED(p) get_u32_le(p)
-#define STORE_UINT32_REVERSED(p,v) put_u32_le(p,v)
+#define LOAD_UINT32_LITTLE(ptr) LOAD_UINT32_REVERSED(ptr)
+#define STORE_UINT32_BIG(ptr,x) (*(UINT32 *)(ptr) = (UINT32)(x))
#endif
-#define LOAD_UINT32_LITTLE(p) (get_u32_le(p))
-#define STORE_UINT32_BIG(p,v) put_u32(p, v)
-
/*
---------------------------------------------------------------------- */
/*
---------------------------------------------------------------------- */
/* ----- Begin KDF & PDF Section
---------------------------------------- */
More information about the openssh-unix-dev
mailing list