[FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions
James Darnley
jdarnley at obe.tv
Tue Nov 29 18:14:35 EET 2016
On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
> 2016-11-29 12:52 GMT+01:00 James Darnley <jdarnley at obe.tv>:
>> sse2:
>> complex: 4.13x faster (1514 vs. 367 cycles)
>> simple: 4.38x faster (1836 vs. 419 cycles)
>>
>> avx:
>> complex: 1.07x faster (260 vs. 244 cycles)
>> simple: 1.03x faster (284 vs. 274 cycles)
>
> What are you comparing?
I stuck a timer around the call to the h264dsp function in
libavcodec/h264_mb_template.c. Using STOP_TIMER(__func__) let me get a
different message for each function created. The two functions my code
was called from were hl_decode_mb_simple_16 and hl_decode_mb_complex.
The video being decoded was one from fate concatenated together several
times.
The AVX comparison is it versus SSE2.
More information about the ffmpeg-devel
mailing list