[FFmpeg-devel] [PATCH v3 00/10] avcodec/vc1: Arm optimisations

Martin Storsjö martin at martin.st
Fri Apr 1 10:08:19 EEST 2022


On Fri, 1 Apr 2022, Martin Storsjö wrote:

> On Thu, 31 Mar 2022, Ben Avison wrote:
>
>> The VC1 decoder was missing lots of important fast paths for Arm, 
>> especially
>> for 64-bit Arm. This submission fills in implementations for all functions
>> where a fast path already existed and the fallback C implementation was
>> taking 1% or more of the runtime, and adds a new fast path to permit
>> vc1_unescape_buffer() to be overridden.
>> 
>> I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4)
>> using `ffmpeg -i <bitstream> -f null -` for a couple of example streams:
>> 
>> Architecture:  AArch32    AArch32    AArch64    AArch64
>> Stream:        1          2          1          2
>> Before speed:  1.22x      0.82x      1.00x      0.67x
>> After speed:   1.31x      0.98x      1.39x      1.06x
>> Improvement:   7.4%       20%        39%        58%
>> 
>> `make fate` passes on both AArch32 and AArch64.
>> 
>> Changes in v2:
>> 
>> * Refactor checkasm tests to convert some macros into functions.
>> * Remove cast-to-void of checked_call.
>> * Limit 16-bit values in idctdsp checkasm test to +/-0x100.
>> * Reinstate ff_add_pixels_clamped_arm.
>> * Adapt vc1 deblocking filters to specify stride as ptrdiff_t.
>> * Add align specifiers to a few VLD/VST instructions for AArch32 deblocking
>>  filter, and adapt checkasm test not to test with tighter alignment than is
>>  encountered in normal use.
>> * Correct unescape buffer memcmp length.
>> * Update benchmarks for AArch64 idctdsp.
>
> Thanks! From a quick readthrough, this version of the patchset seems good to 
> me! I'll run it through some more testing, and push it if everything seems to 
> work fine (tomorrow or so).

Pushed now - thanks for your contribution!

// Martin


More information about the ffmpeg-devel mailing list