[FFmpeg-devel] [PATCH 3/3] avcodec/aarch64: add hevc deblock NEON

Martin Storsjö martin at martin.st
Wed Feb 21 14:08:16 EET 2024


On Wed, 21 Feb 2024, J. Dekker wrote:

> Benched using single-threaded full decode on an Ampere Altra.
>
> Bpp Before  After  Speedup
> 8   73,3s   65,2s  1.124x
> 10  114,2s  104,0s 1.098x
> 12  125,8s  115,7s 1.087x
>
> Signed-off-by: J. Dekker <jdek at itanimul.li>
> ---
> libavcodec/aarch64/hevcdsp_deblock_neon.S | 421 ++++++++++++++++++++++
> libavcodec/aarch64/hevcdsp_init_aarch64.c |  18 +
> 2 files changed, 439 insertions(+)

> +0:      // STRONG FILTER
> +
> +        // P0 = p0 + av_clip(((p2 + 2 * p1 + 2 * p0 + 2 * q0 + q1 + 4) >> 3) - p0, -tc3, tc3);
> +        add             v21.8h, v2.8h, v3.8h   // (p1 + p0
> +        add             v21.8h, v4.8h, v21.8h  //     + q0)
> +        shl             v21.8h, v21.8h, #1     //           * 2
> +        add             v22.8h, v1.8h, v5.8h   //   (p2 + q1)
> +        add             v21.8h, v22.8h, v21.8h // +
> +        srshr            v21.8h, v21.8h, #3     //               >> 3
> +        sub             v21.8h, v21.8h, v3.8h  //                    - p0
> +

The srshr line is incorrectly indented here (and elsewhere)

> +        sqxtun          v4.8b, v4.8h
> +        sqxtun          v5.8b, v5.8h
> +        sqxtun          v6.8b, v6.8h
> +        sqxtun          v7.8b, v7.8h
> +.endif
> +        ret
> +3:      ret x6

Please indent the "x6" here like other operands

> +.macro hevc_loop_filter_luma dir bitdepth
> +function ff_hevc_\dir\()_loop_filter_luma_\bitdepth\()_neon, export=1
> +        mov             x6, x30
> +.if \dir == v

In GAS assembler, .if does a numerical comparison - it can't do string 
comparisons.

The right way to do this is to do ".ifc \dir, v", which does a string 
comparison.

(If you really do need to do this like a numerical comparison, it's 
possible to define e.g. "v" as a numeric symbol as well, see e.g. 
https://code.videolan.org/videolan/dav1d/-/merge_requests/1603/diffs?commit_id=d4746c908c56cb2e8545efd348b8cdc13f2f2253 
but that's not really the nicest way to do it.)

This issue breaks compilation with Clang. With gas-preprocessor (for 
MSVC), it manages to build correctly, but does the wrong thing.


To avoid me having to test all these build configurations manually, 
remembering to check all these corner case build configurations and check 
indentation and all, I've set up a PoC for testing such things on Github 
Actions.

If you have a repo on github, grab my commits from 
https://github.com/mstorsjo/FFmpeg/commits/gha-aarch64 (there are a couple 
of them), add your changes on top of these, and push it as a branch to 
your own github repo, then check the output from the actions.

Here's the output of a run with the patches you just posted: 
https://github.com/mstorsjo/FFmpeg/actions/runs/7988312683

// Martin



More information about the ffmpeg-devel mailing list