[PATCH] Improve endian conversion in umac.c

rapier rapier at psc.edu
Thu Mar 10 07:02:31 AEDT 2022



On 3/8/22 6:12 PM, Darren Tucker wrote:
> On Wed, 9 Mar 2022 at 09:59, rapier <rapier at psc.edu> wrote:
>> I was poking at the MAC routines looking for some efficiencies for high
>> performance environments. I was looking at the umac.c and comparing it
>> to the original source at https://fastcrypto.org/front/umac/umac.c After
>> a couple of false starts I found that reverting the endian conversion
>> routines back to what Krovetz wrote realized a 8% to 16% improvement
> 
> Interesting!  One obvious difference is what you have is potentially
> inline-able static functions instead of function calls across
> compilation units that (barring whole program optimization) can't be
> inlined.  If you put the existing functions from misc.c into umac.c as
> statics do you see the same improvement?


That worked and I saw the same improvement. For a 20GB test (a dd pipe 
with aes2560ctr) I'm seeing peaks at 870MB/s versus 720MB/s for stock. 
So it does look like that its being inlined. I'm going to poke at a 
couple more things and then provide an updated patch. I think I have a 
big endian system around here somewhere so I want to test on that as well.

This is pleasing. Initially I was looking at improving performance by 
pipelining the MAC but that's not possible with ETM. This is about the 
level of performance gain I was hoping to get with that and it's a lot 
easier.

Anyway, I'll get the new patch up soon.

Chris


More information about the openssh-unix-dev mailing list