[FFmpeg-devel] [PATCH] md5: optimize second round by using 4-operation form of G()

Mon May 20 06:02:46 CEST 2013

On 19/05/13 10:15 PM, Clément Bœsch wrote:
>> From f33000be40325dabf70be6fe631f92b7d4b031a7 Mon Sep 17 00:00:00 2001
>> From: Giorgio Vazzana <mywing81 at gmail.com>
>> Date: Sat, 18 May 2013 13:53:52 +0200
>> Subject: [PATCH] md5: optimize second round by using 4-operation form of G()
>>
>> 4-operation form is preferred over 3-operation because it breaks a long
>> dependency chain, thus allowing a superscalar processor to execute more
>> operations in parallel.
>> The idea was taken from: http://www.zorinaq.com/papers/md5-amd64.html
>> ---
>>  libavutil/md5.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/libavutil/md5.c b/libavutil/md5.c
>> index 7375ce5..e3c4981 100644
>> --- a/libavutil/md5.c
>> +++ b/libavutil/md5.c
>> @@ -84,7 +84,7 @@ static const uint32_t T[64] = { // T[i]= fabs(sin(i+1)<<32)
>>                                                                          \
>>          if (i < 32) {                                                   \
>>              if (i < 16) a += (d ^ (b & (c ^ d))) + X[       i  & 15];   \
>> -            else        a += (c ^ (d & (c ^ b))) + X[(1 + 5*i) & 15];   \
>> +            else        a += ((d & b) | (~d & c))+ X[(1 + 5*i) & 15];   \
> 
> Why not use the same trick for the i < 16 case, with something like
> (b & c) | (~b & d) ?

Check the section "Parallelism of operations in round 2" in the paper Giorgio linked.
F() (Or i < 16 as it's in lavu) performs better with the 3-operation form.

AMD Athlon X2 7750 x86_64

F() 3-operation (git head)
lavu    MD5      size:  1048576  runs:     1024  time:    5.610 +- 0.157
lavu    MD5      size:  1048576  runs:     1024  time:    5.613 +- 0.133
lavu    MD5      size:  1048576  runs:     1024  time:    5.617 +- 0.160
lavu    MD5      size:  1048576  runs:     1024  time:    5.614 +- 0.158

F() 4-operation
lavu    MD5      size:  1048576  runs:     1024  time:    6.250 +- 0.141
lavu    MD5      size:  1048576  runs:     1024  time:    6.253 +- 0.147
lavu    MD5      size:  1048576  runs:     1024  time:    6.256 +- 0.152
lavu    MD5      size:  1048576  runs:     1024  time:    6.250 +- 0.132