[FFmpeg-devel] [PATCH v2 1/1] lavc/aarch64: add some neon pix_abs functions

Swinney, Jonathan jswinney at amazon.com
Tue Apr 26 01:43:25 EEST 2022


Thanks to Michael and Martin for you reviews on several of my patches. I've made many of the changes you have requested, but I'm not yet ready to resubmit the patches. I'll be out of the office until next week and I will submit updated versions then. Thanks!

-- 

Jonathan Swinney

On 4/15/22, 11:45 AM, "ffmpeg-devel on behalf of Michael Niedermayer" <ffmpeg-devel-bounces at ffmpeg.org on behalf of michael at niedermayer.cc> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    On Thu, Apr 14, 2022 at 04:22:58PM +0000, Swinney, Jonathan wrote:
    >  - ff_pix_abs16_neon
    >  - ff_pix_abs16_xy2_neon
    >
    > In direct micro benchmarks of these ff functions verses their C implementations,
    > these functions performed as follows on AWS Graviton 2:
    >
    > ff_pix_abs16_neon:
    > c:  benchmark ran 100000 iterations in 0.955383 seconds
    > ff: benchmark ran 100000 iterations in 0.097669 seconds
    >
    > ff_pix_abs16_xy2_neon:
    > c:  benchmark ran 100000 iterations in 1.916759 seconds
    > ff: benchmark ran 100000 iterations in 0.370729 seconds
    >
    > Signed-off-by: Jonathan Swinney <jswinney at amazon.com>
    > ---
    >  libavcodec/aarch64/Makefile              |   2 +
    >  libavcodec/aarch64/me_cmp_init_aarch64.c |  39 +++++
    >  libavcodec/aarch64/me_cmp_neon.S         | 209 +++++++++++++++++++++++
    >  libavcodec/me_cmp.c                      |   2 +
    >  libavcodec/me_cmp.h                      |   1 +
    >  libavcodec/x86/me_cmp.asm                |   7 +
    >  libavcodec/x86/me_cmp_init.c             |   3 +
    >  tests/checkasm/Makefile                  |   2 +-
    >  tests/checkasm/checkasm.c                |   1 +
    >  tests/checkasm/checkasm.h                |   1 +
    >  tests/checkasm/motion.c                  | 155 +++++++++++++++++
    >  11 files changed, 421 insertions(+), 1 deletion(-)
    >  create mode 100644 libavcodec/aarch64/me_cmp_init_aarch64.c
    >  create mode 100644 libavcodec/aarch64/me_cmp_neon.S
    >  create mode 100644 tests/checkasm/motion.c
    >
    [...]
    > diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
    > index ad06d485ab..f73b9f9161 100644
    > --- a/libavcodec/x86/me_cmp.asm
    > +++ b/libavcodec/x86/me_cmp.asm
    > @@ -255,6 +255,7 @@ hadamard8x8_diff %+ SUFFIX:
    >
    >      HSUM                         m0, m1, eax
    >      and                         rax, 0xFFFF
    > +    emms
    >      ret
    >
    >  hadamard8_16_wrapper 0, 14
    > @@ -345,6 +346,7 @@ cglobal sse%1, 5,5,8, v, pix1, pix2, lsize, h
    >
    >      HADDD     m7, m1
    >      movd     eax, m7         ; return value
    > +    emms
    >      RET
    >  %endmacro

    on which arm chip did you test this ?


    [...]
    > diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c
    > index 9af911bb88..b330868a38 100644
    > --- a/libavcodec/x86/me_cmp_init.c
    > +++ b/libavcodec/x86/me_cmp_init.c
    > @@ -186,6 +186,8 @@ static int vsad_intra16_mmx(MpegEncContext *v, uint8_t *pix, uint8_t *dummy,
    >          : "r" (stride), "m" (h)
    >          : "%ecx");
    >
    > +    emms_c();
    > +
    >      return tmp & 0xFFFF;
    >  }
    >  #undef SUM
    > @@ -418,6 +420,7 @@ static inline int sum_mmx(void)
    >          "paddw %%mm0, %%mm6             \n\t"
    >          "movd %%mm6, %0                 \n\t"
    >          : "=r" (ret));
    > +    emms_c();
    >      return ret & 0xFFFF;
    >  }

    hmmm

    Also before the patch
    checkasm: all 6153 tests passed
    after it
    checkasm: all 3198 tests passed

    thats on a x86-64

    [...]

    --
    Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

    Complexity theory is the science of finding the exact solution to an
    approximation. Benchmarking OTOH is finding an approximation of the exact



More information about the ffmpeg-devel mailing list