[FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
James Darnley
jdarnley at obe.tv
Fri Dec 2 01:56:16 EET 2016
On 2016-12-02 00:31, Carl Eugen Hoyos wrote:
> 2016-12-01 17:57 GMT+01:00 James Darnley <jdarnley at obe.tv>:
>> Yorkfield:
>> - mmx2: 2.44x faster (278 vs. 114 cycles)
>> - sse2: 3.35x faster (278 vs. 83 cycles)
>>
>> Skylake:
>> - mmx2: 1.69x faster (169 vs. 100 cycles)
>> - sse2: 2.34x faster (169 vs. 72 cycles)
>
> Is it expected (or possible) that the speed impact is so
> different for different Intel hardware?
Yes. Intel's Core branded processors introduced a much better
micro-architecture (the generation after the Yorkfield) which will cause
the scalar C code to be quite a bit faster. The SIMD on the other hand
was already so quick it didn't gain much.
(At least I think I remember this being the story.)
>> - avx: 2.32x faster (169 vs. 73 cycles)
>
> Don't you agree that if this is true (I don't know if it is)
> the patch should not be applied as is?
I do agree and I wouldn't (deliberately) apply anything that made the
decoder slower, or not as fast as it could be.
More information about the ffmpeg-devel
mailing list