[FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions
Jean-Baptiste Kempf
jb at videolan.org
Tue Mar 26 08:01:43 EET 2024
On Mon, 25 Mar 2024, at 22:56, J. Dekker wrote:
>> On Mon, 25 Mar 2024, Martin Storsjö wrote:
>>
>>> Since some time, we have pretty complete AArch64 NEON coverage
>>> for the hevc decoder.
>>>
>>> However, some of these functions require the I8MM instruction set
>>> extension, and many of them (but not all) lack a plain NEON
>>> version.
>>>
>>> This patchset fills in a regular NEON version of all functions
>>> where we have an I8MM function.
>>>
>>> For context; the I8MM instruction set extension is a mandatory
>>> part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it,
>>> but Apple M1 and Ampere Altra don't.
>>>
>>> This patchset takes decoding of a 1080p HEVC clip from 402
>>> fps to 649 fps on an Apple M1.
>>>
>>> Patch #2 also fixes a subtle bug in the existing implementation;
>>> two functions relied on the contents on the stack, below the
>>> stack pointer, being untouched within a function. If a signal
>>> gets delivered, those parts of the stack could be clobbered.
>>
>> I know this is a bit short notice for a patchset of this size - but, would people be OK with merging this patchset before the impending 7.0 branch (which is made within the next 24h)?
>>
>> The patches pass all my tricky build configurations, they give a very non-negligible speedup on many common CPUs, and patch #2 fixes a real bug in the existing impleemntations. (A bug fix patch can of course be backported after the branch too, but performance optimizations aren't generally relevant for backporting.)
>>
>> // Martin
>
> Yes, please. I will tomorrow morning if you didn’t already push.
+1
--
Jean-Baptiste Kempf - President
+33 672 704 734
https://jbkempf.com/
More information about the ffmpeg-devel
mailing list