[FFmpeg-devel] [PATCH 0/5] Provide optimized neon implementation
Hubert Mazur
hum at semihalf.com
Thu Sep 8 12:25:02 EEST 2022
Fix minor issues in the patches.
Regarding vsse16 I didn't change saba & umlal to sub & smlal.
It doesn't affect the performance, so left it as it was.
The majority of changes refer to nsse16:
- fixed indentation (thanks for pointing out),
- applied the patch from Martin which fixes the balance
within instructions,
- interleaved instructions - apparently this helped a little
to achieve better benchmarks.
I have also updated the benchmark results for each function -
not a huge performance improvement, but worth the effort.
For nsse and vsse are shown below (these are the biggest changes).
- vsse16 asm from 64.7 to 59.2,
- nsse16 asm from 120.0 to 116.5.
Hubert Mazur (5):
lavc/aarch64: Add neon implementation for vsad16
lavc/aarch64: Add neon implementation of vsse16
lavc/aarch64: Add neon implementation for vsad_intra16
lavc/aarch64: Add neon implementation for vsse_intra16
lavc/aarch64: Provide neon implementation of nsse16
libavcodec/aarch64/me_cmp_init_aarch64.c | 30 ++
libavcodec/aarch64/me_cmp_neon.S | 385 +++++++++++++++++++++++
2 files changed, 415 insertions(+)
--
2.34.1
More information about the ffmpeg-devel
mailing list