[FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19
Martin Storsjö
martin at martin.st
Mon Oct 24 16:19:18 EEST 2022
On Mon, 17 Oct 2022, Hubert Mazur wrote:
> Provide arm64 neon optimized implementations for hscale16To19 with
> filter sizes 4, 8 and X4.
>
> The tests and benchmarks run on AWS Graviton 2 instances.
> The results from a checkasm tool are shown below.
>
> hscale_16_to_19__fs_4_dstW_512_c: 6216.0
> hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
> hscale_16_to_19__fs_8_dstW_512_c: 10417.7
> hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
> hscale_16_to_19__fs_12_dstW_512_c: 14890.5
> hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
> hscale_16_to_19__fs_16_dstW_512_c: 19006.5
> hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
> hscale_16_to_19__fs_32_dstW_512_c: 36629.5
> hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
> hscale_16_to_19__fs_40_dstW_512_c: 45477.5
> hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
>
> Signed-off-by: Hubert Mazur <hum at semihalf.com>
> ---
> libswscale/aarch64/hscale.S | 402 +++++++++++++++++++++++++++++++++++
> libswscale/aarch64/swscale.c | 70 +++++-
> 2 files changed, 471 insertions(+), 1 deletion(-)
> +void ff_hscale16to19_4_neon_asm(int shift, int16_t *_dst, int dstW,
> + const uint8_t *_src, const int16_t *filter,
> + const int32_t *filterPos, int filterSize);
> +void ff_hscale16to19_X8_neon_asm(int shift, int16_t *_dst, int dstW,
> + const uint8_t *_src, const int16_t *filter,
> + const int32_t *filterPos, int filterSize);
> +void ff_hscale16to19_X4_neon_asm(int shift, int16_t *_dst, int dstW,
> + const uint8_t *_src, const int16_t *filter,
> + const int32_t *filterPos, int filterSize);
> +
> #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \
> void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
> SwsContext *c, int16_t *data, \
> @@ -43,7 +53,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
> #define SCALE_FUNCS(filter_n, opt) \
> SCALE_FUNC(filter_n, 8, 15, opt); \
> SCALE_FUNC(filter_n, 8, 19, opt); \
> - SCALE_FUNC(filter_n, 16, 15, opt);
> + SCALE_FUNC(filter_n, 16, 15, opt); \
> + SCALE_FUNC(filter_n, 16, 19, opt);
So this declares the functions we're implementing as C wrappers below, and
the manual declarations further up declare the actual asm functions?
I guess that works, although it makes unnecessary extern functions. In
such cases, we usually have the C functions be static functions, placed
above the code that uses them. But it's not a big deal.
Other than that, this patchset mostly seems fine.
However, I tested the patches on x86, and the new checkasm tests do fail
on x86 (both i386 and x86_64) - so that needs to be fixed anyway. So since
we'll need to do a new round anyway, please do try to fix up the minor
cosmetics I mentioned.
// Martin
More information about the ffmpeg-devel
mailing list