[FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths

Martin Storsjö martin at martin.st
Wed Mar 30 15:37:51 EEST 2022


On Fri, 25 Mar 2022, Ben Avison wrote:

> checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C
> version can still outperform the NEON version in specific cases. The balance
> between different code paths is stream-dependent, but in practice the best
> case happens about 5% of the time, the worst case happens about 40% of the
> time, and the complexity of the remaining cases fall somewhere in between.
> Therefore, taking the average of the best and worst case timings is
> probably a conservative estimate of the degree by which the NEON code
> improves performance.
>
> vc1dsp.vc1_h_loop_filter4_bestcase_c: 19.0
> vc1dsp.vc1_h_loop_filter4_bestcase_neon: 48.5
> vc1dsp.vc1_h_loop_filter4_worstcase_c: 144.7
> vc1dsp.vc1_h_loop_filter4_worstcase_neon: 76.2
> vc1dsp.vc1_h_loop_filter8_bestcase_c: 41.0
> vc1dsp.vc1_h_loop_filter8_bestcase_neon: 75.0
> vc1dsp.vc1_h_loop_filter8_worstcase_c: 294.0
> vc1dsp.vc1_h_loop_filter8_worstcase_neon: 102.7
> vc1dsp.vc1_h_loop_filter16_bestcase_c: 54.7
> vc1dsp.vc1_h_loop_filter16_bestcase_neon: 130.0
> vc1dsp.vc1_h_loop_filter16_worstcase_c: 569.7
> vc1dsp.vc1_h_loop_filter16_worstcase_neon: 186.7
> vc1dsp.vc1_v_loop_filter4_bestcase_c: 20.2
> vc1dsp.vc1_v_loop_filter4_bestcase_neon: 47.2
> vc1dsp.vc1_v_loop_filter4_worstcase_c: 164.2
> vc1dsp.vc1_v_loop_filter4_worstcase_neon: 68.5
> vc1dsp.vc1_v_loop_filter8_bestcase_c: 43.5
> vc1dsp.vc1_v_loop_filter8_bestcase_neon: 55.2
> vc1dsp.vc1_v_loop_filter8_worstcase_c: 316.2
> vc1dsp.vc1_v_loop_filter8_worstcase_neon: 72.7
> vc1dsp.vc1_v_loop_filter16_bestcase_c: 62.2
> vc1dsp.vc1_v_loop_filter16_bestcase_neon: 103.7
> vc1dsp.vc1_v_loop_filter16_worstcase_c: 646.5
> vc1dsp.vc1_v_loop_filter16_worstcase_neon: 110.7
>
> Signed-off-by: Ben Avison <bavison at riscosopen.org>
> ---
> libavcodec/arm/vc1dsp_init_neon.c |  14 +
> libavcodec/arm/vc1dsp_neon.S      | 643 ++++++++++++++++++++++++++++++
> 2 files changed, 657 insertions(+)

Looks like a close analogue to the arm64 case (i.e. looks good!), only the 
open question of code sharing/reuse between horizontal and vertical.

// Martin



More information about the ffmpeg-devel mailing list