[FFmpeg-devel] [PATCH 3/3] avcodec/aarch64: add hevc deblock NEON
Martin Storsjö
martin at martin.st
Wed Feb 21 14:08:16 EET 2024
On Wed, 21 Feb 2024, J. Dekker wrote:
> Benched using single-threaded full decode on an Ampere Altra.
>
> Bpp Before After Speedup
> 8 73,3s 65,2s 1.124x
> 10 114,2s 104,0s 1.098x
> 12 125,8s 115,7s 1.087x
>
> Signed-off-by: J. Dekker <jdek at itanimul.li>
> ---
> libavcodec/aarch64/hevcdsp_deblock_neon.S | 421 ++++++++++++++++++++++
> libavcodec/aarch64/hevcdsp_init_aarch64.c | 18 +
> 2 files changed, 439 insertions(+)
> +0: // STRONG FILTER
> +
> + // P0 = p0 + av_clip(((p2 + 2 * p1 + 2 * p0 + 2 * q0 + q1 + 4) >> 3) - p0, -tc3, tc3);
> + add v21.8h, v2.8h, v3.8h // (p1 + p0
> + add v21.8h, v4.8h, v21.8h // + q0)
> + shl v21.8h, v21.8h, #1 // * 2
> + add v22.8h, v1.8h, v5.8h // (p2 + q1)
> + add v21.8h, v22.8h, v21.8h // +
> + srshr v21.8h, v21.8h, #3 // >> 3
> + sub v21.8h, v21.8h, v3.8h // - p0
> +
The srshr line is incorrectly indented here (and elsewhere)
> + sqxtun v4.8b, v4.8h
> + sqxtun v5.8b, v5.8h
> + sqxtun v6.8b, v6.8h
> + sqxtun v7.8b, v7.8h
> +.endif
> + ret
> +3: ret x6
Please indent the "x6" here like other operands
> +.macro hevc_loop_filter_luma dir bitdepth
> +function ff_hevc_\dir\()_loop_filter_luma_\bitdepth\()_neon, export=1
> + mov x6, x30
> +.if \dir == v
In GAS assembler, .if does a numerical comparison - it can't do string
comparisons.
The right way to do this is to do ".ifc \dir, v", which does a string
comparison.
(If you really do need to do this like a numerical comparison, it's
possible to define e.g. "v" as a numeric symbol as well, see e.g.
https://code.videolan.org/videolan/dav1d/-/merge_requests/1603/diffs?commit_id=d4746c908c56cb2e8545efd348b8cdc13f2f2253
but that's not really the nicest way to do it.)
This issue breaks compilation with Clang. With gas-preprocessor (for
MSVC), it manages to build correctly, but does the wrong thing.
To avoid me having to test all these build configurations manually,
remembering to check all these corner case build configurations and check
indentation and all, I've set up a PoC for testing such things on Github
Actions.
If you have a repo on github, grab my commits from
https://github.com/mstorsjo/FFmpeg/commits/gha-aarch64 (there are a couple
of them), add your changes on top of these, and push it as a branch to
your own github repo, then check the output from the actions.
Here's the output of a run with the patches you just posted:
https://github.com/mstorsjo/FFmpeg/actions/runs/7988312683
// Martin
More information about the ffmpeg-devel
mailing list