[FFmpeg-devel] [PATCH] avutil/aarch64/float_dsp_neon: Refactor ff_vector_fmul_add_neon
Krzysztof Pyrkosz
ffmpeg at szaka.eu
Thu Jan 23 23:31:05 EET 2025
On Sun, Jan 19, 2025 at 10:57:57PM +0200, Martin Storsjö wrote:
> On Sun, 19 Jan 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
>
> > Removed a branch, unrolled loop. Speed increase bumped from 3.95 to 5.60.
>
> On what core is that? Please quote the actual output including the absolute
> numbers.
>
> I'm getting much more inconclusive numbers for this one:
>
> Before: Cortex A53 A72 A73 A78
> vector_fmul_add_neon: 620.0 257.2 624.5 162.8
> After:
> vector_fmul_add_neon: 767.0 259.2 767.5 110.5
>
> This seems to make things quite a lot slower on 2 of these 4 cores. On the
> A78, I'm getting numbers that look like yours though.
>
> So while it makes things better on one kind of core, it also regresses
> things quite a bit on others, so I'm not quite as convinced about this one.
Indeed, I conducted the tests and gathered the benchmark results on the
A78, Rock5B to be more specific.
> Doing that change, i.e.
>
> fmla v4.4s, v0.4s, v2.4s
> fmla v5.4s, v1.4s, v3.4s
> + subs w4, w4, #16
> stp q4, q5, [x0], #32
> - sub w4, w4, #16
> - cbnz w4, 1b
> + b.gt 1b
> ret
>
> has this effect on numbers:
>
> Before: Cortex A53 A72 A73 A78
> vector_fmul_add_neon: 767.0 259.2 769.5 109.0
> After:
> vector_fmul_add_neon: 751.0 254.5 751.0 109.2
>
Below are my results for different cores I have lying around. First is
the mainline version, second uses the patch from the original email,
third improves things by reordering instructions:
- vector_fmul_neon
A72 A78 Thinkpad x13s
Mainline: 218.42 114.7 62.85
Original patch: 223.92 86.16 64.77
Reordered instr: 221.75 85.66 61.16
- vector_fmul_add_neon
A72 A78 Thinkpad x13s
Mainline: 269.49 163.43 84.48
Original patch: 269.42 114.16 85.31
Reordered instr: 266.92 111.62 85.04
> That makes things a little bit better on A53,A72,A73, but it's still overall
> a notable regression on the A53 and A73.
Let's skip this patch and two others then, thank you for the review.
Krzysztof
More information about the ffmpeg-devel
mailing list