[FFmpeg-devel] [PATCH 1/6] lavc/aarch64: new optimization for 8-bit hevc_pel_bi_pixels

Martin Storsjö martin at martin.st
Fri Dec 1 20:09:07 EET 2023


On Sat, 18 Nov 2023, Logan.Lyu wrote:

> diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S 
> b/libavcodec/aarch64/hevcdsp_epel_neon.S
> index 708b903b00..74165273d7 100644
> --- a/libavcodec/aarch64/hevcdsp_epel_neon.S
> +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S
> @@ -244,6 +244,185 @@ function ff_hevc_put_hevc_pel_pixels64_8_neon, export=1
> endfunc
>  +function ff_hevc_put_hevc_pel_bi_pixels4_8_neon, export=1
> +        mov             x10, #(MAX_PB_SIZE * 2)
> +1:      ld1             {v0.s}[0], [x2], x3 // src
> +        ushll           v16.8h, v0.8b, #6
> +        ld1             {v20.4h}, [x4], x10 // src2
> +        sqadd           v16.8h, v16.8h, v20.8h
> +        sqrshrun        v0.8b,  v16.8h, #7
> +        st1             {v0.s}[0], [x0], x1
> +        subs            w5, w5, #1
> +        b.ne            1b

In many of these functions, the "subs" instruction could be scheduled 
better, either after the ld1, or between sqrshrun and st1. It probably 
doesn't matter much, but if you have access to an in-order core, you might 
gain a cycle per iteration here.

> diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c 
> b/libavcodec/aarch64/hevcdsp_init_aarch64.c
> index c51488275c..cf171023e7 100644
> --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c
> +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c
> @@ -156,8 +156,12 @@ NEON8_FNPROTO(pel_pixels, (int16_t *dst,
>         const uint8_t *src, ptrdiff_t srcstride,
>         int height, intptr_t mx, intptr_t my, int width),);
> -NEON8_FNPROTO(epel_v, (int16_t *dst,
> -        const uint8_t *src, ptrdiff_t srcstride,
> +NEON8_FNPROTO(pel_bi_pixels, (uint8_t *dst, ptrdiff_t dststride,
> +        const uint8_t *_src, ptrdiff_t _srcstride, const int16_t *src2,
> +        int height, intptr_t mx, intptr_t my, int width),);
> +
> +NEON8_FNPROTO(epel_v, (uint8_t *dst, ptrdiff_t dststride,
> +        const uint8_t *_src, ptrdiff_t _srcstride, const int16_t *src2,

Here, you're breaking the interface of the existing prototypes for epel_v. 
Depending on compiler, this either causes warnings, or with modern Clang, 
errors. Please pay attention to potential warnings in the file you edit, 
when authoring a new patch.

// Martin



More information about the ffmpeg-devel mailing list