[FFmpeg-devel] [PATCH v3 00/10] avcodec/vc1: Arm optimisations
Martin Storsjö
martin at martin.st
Fri Apr 1 10:08:19 EEST 2022
On Fri, 1 Apr 2022, Martin Storsjö wrote:
> On Thu, 31 Mar 2022, Ben Avison wrote:
>
>> The VC1 decoder was missing lots of important fast paths for Arm,
>> especially
>> for 64-bit Arm. This submission fills in implementations for all functions
>> where a fast path already existed and the fallback C implementation was
>> taking 1% or more of the runtime, and adds a new fast path to permit
>> vc1_unescape_buffer() to be overridden.
>>
>> I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4)
>> using `ffmpeg -i <bitstream> -f null -` for a couple of example streams:
>>
>> Architecture: AArch32 AArch32 AArch64 AArch64
>> Stream: 1 2 1 2
>> Before speed: 1.22x 0.82x 1.00x 0.67x
>> After speed: 1.31x 0.98x 1.39x 1.06x
>> Improvement: 7.4% 20% 39% 58%
>>
>> `make fate` passes on both AArch32 and AArch64.
>>
>> Changes in v2:
>>
>> * Refactor checkasm tests to convert some macros into functions.
>> * Remove cast-to-void of checked_call.
>> * Limit 16-bit values in idctdsp checkasm test to +/-0x100.
>> * Reinstate ff_add_pixels_clamped_arm.
>> * Adapt vc1 deblocking filters to specify stride as ptrdiff_t.
>> * Add align specifiers to a few VLD/VST instructions for AArch32 deblocking
>> filter, and adapt checkasm test not to test with tighter alignment than is
>> encountered in normal use.
>> * Correct unescape buffer memcmp length.
>> * Update benchmarks for AArch64 idctdsp.
>
> Thanks! From a quick readthrough, this version of the patchset seems good to
> me! I'll run it through some more testing, and push it if everything seems to
> work fine (tomorrow or so).
Pushed now - thanks for your contribution!
// Martin
More information about the ffmpeg-devel
mailing list