[FFmpeg-devel] [PATCH] md5: optimize second round by using 4-operation form of G()
James Almer
jamrial at gmail.com
Mon May 20 06:02:46 CEST 2013
On 19/05/13 10:15 PM, Clément Bœsch wrote:
>> From f33000be40325dabf70be6fe631f92b7d4b031a7 Mon Sep 17 00:00:00 2001
>> From: Giorgio Vazzana <mywing81 at gmail.com>
>> Date: Sat, 18 May 2013 13:53:52 +0200
>> Subject: [PATCH] md5: optimize second round by using 4-operation form of G()
>>
>> 4-operation form is preferred over 3-operation because it breaks a long
>> dependency chain, thus allowing a superscalar processor to execute more
>> operations in parallel.
>> The idea was taken from: http://www.zorinaq.com/papers/md5-amd64.html
>> ---
>> libavutil/md5.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/libavutil/md5.c b/libavutil/md5.c
>> index 7375ce5..e3c4981 100644
>> --- a/libavutil/md5.c
>> +++ b/libavutil/md5.c
>> @@ -84,7 +84,7 @@ static const uint32_t T[64] = { // T[i]= fabs(sin(i+1)<<32)
>> \
>> if (i < 32) { \
>> if (i < 16) a += (d ^ (b & (c ^ d))) + X[ i & 15]; \
>> - else a += (c ^ (d & (c ^ b))) + X[(1 + 5*i) & 15]; \
>> + else a += ((d & b) | (~d & c))+ X[(1 + 5*i) & 15]; \
>
> Why not use the same trick for the i < 16 case, with something like
> (b & c) | (~b & d) ?
Check the section "Parallelism of operations in round 2" in the paper Giorgio linked.
F() (Or i < 16 as it's in lavu) performs better with the 3-operation form.
AMD Athlon X2 7750 x86_64
F() 3-operation (git head)
lavu MD5 size: 1048576 runs: 1024 time: 5.610 +- 0.157
lavu MD5 size: 1048576 runs: 1024 time: 5.613 +- 0.133
lavu MD5 size: 1048576 runs: 1024 time: 5.617 +- 0.160
lavu MD5 size: 1048576 runs: 1024 time: 5.614 +- 0.158
F() 4-operation
lavu MD5 size: 1048576 runs: 1024 time: 6.250 +- 0.141
lavu MD5 size: 1048576 runs: 1024 time: 6.253 +- 0.147
lavu MD5 size: 1048576 runs: 1024 time: 6.256 +- 0.152
lavu MD5 size: 1048576 runs: 1024 time: 6.250 +- 0.132
More information about the ffmpeg-devel
mailing list