[FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions
James Darnley
jdarnley at obe.tv
Wed Nov 30 22:40:12 EET 2016
On 2016-11-30 13:57, Ronald S. Bultje wrote:
> On Wed, Nov 30, 2016 at 7:10 AM, James Darnley <jdarnley at obe.tv> wrote:
>>> Nehalem:
>>> - sse2:
>>> - complex: 4.13x faster (1514 vs. 367 cycles)
>>> - simple: 4.38x faster (1836 vs. 419 cycles)
>>>
>>> Haswell:
>>> - sse2:
>>> - complex: 3.61x faster ( 936 vs. 260 cycles)
>>> - simple: 3.97x faster (1126 vs. 284 cycles)
>>> - avx (versus sse2):
>>> - complex: 1.07x faster (260 vs. 244 cycles)
>>> - simple: 1.03x faster (284 vs. 274 cycles)
>>
>> I included the sse2 results for the Haswell to show that the avx is
>> (slightly) better.
>
>
> Ah! Now it makes sense. I had no idea why your SSE2 results changed from
> 367 (SSE2 vs. C) to 260 cycles (AVX vs. SSE2).
Great. If there are no further comments I will push later tonight.
First I need to correct the micro-architecture names. Then I will
rebase onto the latest master and push.
More information about the ffmpeg-devel
mailing list