[FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions

Martin Storsjö martin at martin.st
Mon Mar 25 17:02:22 EET 2024


Hi,

Since some time, we have pretty complete AArch64 NEON coverage
for the hevc decoder.

However, some of these functions require the I8MM instruction set
extension, and many of them (but not all) lack a plain NEON
version.

This patchset fills in a regular NEON version of all functions
where we have an I8MM function.

For context; the I8MM instruction set extension is a mandatory
part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it,
but Apple M1 and Ampere Altra don't.

This patchset takes decoding of a 1080p HEVC clip from 402
fps to 649 fps on an Apple M1.

Patch #2 also fixes a subtle bug in the existing implementation;
two functions relied on the contents on the stack, below the
stack pointer, being untouched within a function. If a signal
gets delivered, those parts of the stack could be clobbered.

// Martin

Martin Storsjö (21):
  aarch64: hevc: Reorder a misplaced function init line
  aarch64: hevc: Don't iterate with sp in
    ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm
  aarch64: hevc: Merge consecutive stores in
    put_hevc_\type\()_h16_8_neon
  aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal
    looping
  aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h
  aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8
  aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8
  aarch64: hevc: Split the epel_*_hv functions into two parts
  aarch64: hevc: Reorder epel_hv functions to prepare for templating
  aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm
  aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm
  aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm
  aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm
  aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8
  aarch64: hevc: Split the qpel_*_hv functions into two parts
  aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon
    functions
  aarch64: hevc: Reorder qpel_hv functions to prepare for templating
  aarch64: hevc: Produce plain neon versions of qpel_hv
  aarch64: hevc: Produce plain neon versions of qpel_uni_hv
  aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv
  aarch64: hevc: Produce plain neon versions of qpel_bi_hv

 libavcodec/aarch64/hevcdsp_epel_neon.S    | 1529 +++++++++++------
 libavcodec/aarch64/hevcdsp_init_aarch64.c |   96 +-
 libavcodec/aarch64/hevcdsp_qpel_neon.S    | 1804 +++++++++++++--------
 3 files changed, 2291 insertions(+), 1138 deletions(-)

-- 
2.39.3 (Apple Git-146)



More information about the ffmpeg-devel mailing list