[FFmpeg-devel] [PATCH 2/2 v2] x86/takdsp: add avx2 versions of all functions

Andreas Rheinhardt andreas.rheinhardt at outlook.com
Sat Dec 23 13:46:47 EET 2023


Lynne:
> Dec 23, 2023, 00:53 by jamrial at gmail.com:
> 
>> On an Intel Core i7 12700k:
>>
>> decorrelate_ls_c: 814.3
>> decorrelate_ls_sse2: 165.8
>> decorrelate_ls_avx2: 101.3
>> decorrelate_sf_c: 1602.6
>> decorrelate_sf_sse4: 640.1
>> decorrelate_sf_avx2: 324.6
>> decorrelate_sm_c: 1564.8
>> decorrelate_sm_sse2: 379.3
>> decorrelate_sm_avx2: 203.3
>> decorrelate_sr_c: 785.3
>> decorrelate_sr_sse2: 176.3
>> decorrelate_sr_avx2: 99.8
>>
>> Signed-off-by: James Almer <jamrial at gmail.com>
>>
> 
> Even better on a Zen3:
> checkasm: all 8 tests passed
> decorrelate_ls_c: 111.1
> decorrelate_ls_sse2: 272.6
> decorrelate_ls_avx2: 94.1
> decorrelate_sf_c: 170.6
> decorrelate_sf_sse4: 400.1
> decorrelate_sf_avx2: 196.1
> decorrelate_sm_c: 187.6
> decorrelate_sm_sse2: 383.1
> decorrelate_sm_avx2: 179.1
> decorrelate_sr_c: 102.6
> decorrelate_sr_sse2: 272.6
> decorrelate_sr_avx2: 94.1
> 

The SSE2 version is worse than the C version? Does this happen for more
DSP code?
(For decorrelate_sf_c, the C version is still the best and the gain of
AVX2 over C is not good for the other three either.)

- Andreas



More information about the ffmpeg-devel mailing list