[FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

Martin Storsjö martin at martin.st
Mon Oct 24 16:19:18 EEST 2022


On Mon, 17 Oct 2022, Hubert Mazur wrote:

> Provide arm64 neon optimized implementations for hscale16To19 with
> filter sizes 4, 8 and X4.
>
> The tests and benchmarks run on AWS Graviton 2 instances.
> The results from a checkasm tool are shown below.
>
> hscale_16_to_19__fs_4_dstW_512_c: 6216.0
> hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
> hscale_16_to_19__fs_8_dstW_512_c: 10417.7
> hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
> hscale_16_to_19__fs_12_dstW_512_c: 14890.5
> hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
> hscale_16_to_19__fs_16_dstW_512_c: 19006.5
> hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
> hscale_16_to_19__fs_32_dstW_512_c: 36629.5
> hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
> hscale_16_to_19__fs_40_dstW_512_c: 45477.5
> hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
>
> Signed-off-by: Hubert Mazur <hum at semihalf.com>
> ---
> libswscale/aarch64/hscale.S  | 402 +++++++++++++++++++++++++++++++++++
> libswscale/aarch64/swscale.c |  70 +++++-
> 2 files changed, 471 insertions(+), 1 deletion(-)

> +void ff_hscale16to19_4_neon_asm(int shift, int16_t *_dst, int dstW,
> +                      const uint8_t *_src, const int16_t *filter,
> +                      const int32_t *filterPos, int filterSize);
> +void ff_hscale16to19_X8_neon_asm(int shift, int16_t *_dst, int dstW,
> +                      const uint8_t *_src, const int16_t *filter,
> +                      const int32_t *filterPos, int filterSize);
> +void ff_hscale16to19_X4_neon_asm(int shift, int16_t *_dst, int dstW,
> +                      const uint8_t *_src, const int16_t *filter,
> +                      const int32_t *filterPos, int filterSize);
> +
> #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \
> void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
>                                                 SwsContext *c, int16_t *data, \
> @@ -43,7 +53,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
> #define SCALE_FUNCS(filter_n, opt) \
>     SCALE_FUNC(filter_n,  8, 15, opt); \
>     SCALE_FUNC(filter_n, 8, 19, opt); \
> -    SCALE_FUNC(filter_n, 16, 15, opt);
> +    SCALE_FUNC(filter_n, 16, 15, opt); \
> +    SCALE_FUNC(filter_n, 16, 19, opt);

So this declares the functions we're implementing as C wrappers below, and 
the manual declarations further up declare the actual asm functions?

I guess that works, although it makes unnecessary extern functions. In 
such cases, we usually have the C functions be static functions, placed 
above the code that uses them. But it's not a big deal.

Other than that, this patchset mostly seems fine.

However, I tested the patches on x86, and the new checkasm tests do fail 
on x86 (both i386 and x86_64) - so that needs to be fixed anyway. So since 
we'll need to do a new round anyway, please do try to fix up the minor 
cosmetics I mentioned.

// Martin



More information about the ffmpeg-devel mailing list