[FFmpeg-devel] [PATCH 1/2] avcodec/x86: move simple_idct to external assembly
James Darnley
jdarnley at obe.tv
Tue May 30 13:35:42 EEST 2017
On 2017-05-29 23:26, Michael Niedermayer wrote:
> On Mon, May 29, 2017 at 09:40:49PM +0200, James Darnley wrote:
>> On 2017-05-29 16:51, James Darnley wrote:
>>> ---
>>> Changes:
>>> - Changed type of d40000 constant to dwords because it gets used as dwords.
>>> - Changed or removed HAVE_MMX_INLINE preprocessor guards.
>>> - Added note about conversion from inline.
>>> - New file no longer has "2" suffix.
>>> - Whitespace (indentation and alignment).
>>>
>>> libavcodec/tests/x86/dct.c | 2 +-
>>> libavcodec/x86/Makefile | 4 +-
>>> libavcodec/x86/idctdsp_init.c | 4 -
>>> libavcodec/x86/simple_idct.asm | 889 +++++++++++++++++++++++++++++++++++++++
>>> libavcodec/x86/simple_idct.c | 929 -----------------------------------------
>>> 5 files changed, 892 insertions(+), 936 deletions(-)
>>> create mode 100644 libavcodec/x86/simple_idct.asm
>>> delete mode 100644 libavcodec/x86/simple_idct.c
>>
>> Ronald queried on IRC about the performance. The libavcodec/tests/dct
>> utility reports these numbers
>>
>> Yorkfield:
>> - inline: IDCT SIMPLE-MMX: 15715.9 kdct/s
>> - external: IDCT SIMPLE-MMX: 15699.9 kdct/s
>>
>> Skylake-U:
>> - inline: IDCT SIMPLE-MMX: 11193.3 kdct/s
>> - external: IDCT SIMPLE-MMX: 11189.7 kdct/s
>
> Its better to benchmark by decoding some videos as the sparsness of
> the coeffs affects speed
Ah, quite true.
Decoding a large HD sample for many runs stays close around 220fps and
187s run time before and after the change.
I will push shortly.
More information about the ffmpeg-devel
mailing list