[FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

Martin Storsjö martin at martin.st
Wed Feb 14 11:42:45 EET 2024


Hi,

On Sun, 4 Feb 2024, Ramiro Polla wrote:

> The code is imported from libjpeg-turbo-3.0.1. The neon registers used
> have been changed to avoid modifying v8-v15.
> ---

I don't remember if we have any extra routines we need to do if importing 
foreign code with a differing license. The license here seems fine in any 
case though.

This seems to work fine in all my test environments. And thanks for making 
sure it doesn't use v8-v15!

I'm not so familiar with these DSP functions, whether it is norm to add a 
new constant like FF_DCT_NEON, but I guess it seems to match the pattern 
of the existing code.


I presume the main case that tests this is "make fate-dct8x8", which 
builds and executes libavcodec/tests/dct? How much work would it be to 
integrate testing of these routines into checkasm? That way we could rest 
assured that the assembly passes all such ABI checks that we do there, 
including what registers must not be clobbered.


The assembly uses a different indentation width than the rest of our 
assembly. I recently spent some effort on cleaning that up so that our 
code is mostly consistent, so I'd prefer not to add new code that deviates 
from it. It primarily looks like you'd need to add 4 spaces at the start 
of each line.

I've used a script for mostly automatically reindenting our arm assembly, 
you can grab it at https://martin.st/temp/ffmpeg-asm-indent.pl, run it as 
"cat file.S | ./ffmpeg-asm-indent.pl > tmp; mv tmp file.S". It's not 100% 
accurate, but mostly gets you there, but it's good to manually check it 
afterwards as well.

// Martin



More information about the ffmpeg-devel mailing list