[FFmpeg-devel] [PATCH v2 0/7] arm64 neon implementation for 8bits functions

Tue Oct 4 14:34:00 EEST 2022

Great!! Thanks a lot for your help and your review.
thanks,
greg

wt., 4 paź 2022 o 12:57 Martin Storsjö <martin at martin.st> napisał(a):

> On Mon, 3 Oct 2022, Grzegorz Bernacki wrote:
>
> > Changes since v1:
> >
> > - changed tabs to spaces
> > - modified branch instruction in vsse8
> > - apply Martin's patches with improved instructions scheduling
> >
> > Grzegorz Bernacki (4):
> >  lavc/aarch64: Add neon implementation for pix_abs8 functions.
> >  lavc/aarch64: Provide neon implementation of nsse8
> >  lavc/aarch64: Provide optimized implementation of vsse8 for arm64.
> >  lavc/aarch64: Add neon implementation for vsse_intra8
> >
> > Martin Storsjö (3):
> >  aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon
> >  aarch64: me_cmp: Fix up the prologue of ff_pix_abs8_xy2_neon
> >  aarch64: me_cmp: Improve scheduling in vsse_intra8
> >
> > libavcodec/aarch64/me_cmp_init_aarch64.c |  33 ++
> > libavcodec/aarch64/me_cmp_neon.S         | 414 +++++++++++++++++++++++
> > 2 files changed, 447 insertions(+)
>
> Thanks! This mostly looked good to me.
>
> I had actually meant that you would squash my fixes into your patches,
> instead of keeping them as separate ones.
>
> After squashing such changes, it might have been interesting to get
> updated benchmarks in those commit messages (the ones that you have from
> Graviton 3). However in this case, these changes didn't really make much
> difference on out-of-order cores, only on in-order cores, so I guess
> there's not that much value in getting updated benchmarks from Graviton 3
> in this case.
>
> So I went ahead and squashed those patches (and added co-authored-by lines
> where relevant), and pushed them now. Thanks for your contribution!
>
> // Martin
>