[FFmpeg-devel] [PATCH] swscale: aarch64: Optimize the final summation in the hscale routine

Fri Apr 22 10:50:50 EEST 2022

On Thu, 21 Apr 2022, Swinney, Jonathan wrote:

> Thanks for making this improvement. I will rebase my patches on your change. I also measured the performance on AWS Graviton 2 and 3. I added the numbers to your table.
>
> Before:                     Cortex A53      A72      A73  Graviton 2  Graviton 3
> hscale_8_to_15_width8_neon:     8273.0   4602.5   4289.5      2429.7      1629.1
> hscale_8_to_15_width16_neon:   12405.7   6803.0   6359.0      3549.0      2378.4
> hscale_8_to_15_width32_neon:   21258.7  11491.7  11469.2      5797.2      3919.6
> hscale_8_to_15_width40_neon:   25652.0  14173.7  12488.2      6893.5      4810.4
>
> After:
> hscale_8_to_15_width8_neon:     7633.0   3981.5   3350.2      1980.7      1261.1
> hscale_8_to_15_width16_neon:   11666.7   5951.0   5512.0      3080.7      2131.4
> hscale_8_to_15_width32_neon:   20900.7  10733.2   9481.7      5275.2      3862.1
> hscale_8_to_15_width40_neon:   24826.0  13536.2  11502.0      6397.2      4731.9

Thanks for the benchmarks! I pushed this patch now, with those numbers 
included.

// Martin