[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

Wed Feb 7 02:01:23 EET 2024

I think in most cases it is like this, but specifically for this function,
using Reduction only once would be slower.

The currently submitted version roughly takes:
pix_abs_0_0_rvv_i32: 136.2

The version that uses Reduction only once takes:
pix_abs_0_0_rvv_i32: 169.2

Here is the implementation of the version that uses it only once:

func ff_pix_abs16_temp_rvv, zve32x
        vsetivli        zero, 16, e32, m4, ta, ma
        vmv.v.i         v24, 0
        vmv.s.x         v0, zero
1:
        vsetvli         zero, zero, e8, m1, tu, ma
        vle8.v          v4, (a1)
        vle8.v          v12, (a2)
        addi            a4, a4, -1
        vwsubu.vv       v16, v4, v12
        add             a1, a1, a3
        vwsubu.vv       v20, v12, v4
        vsetvli         zero, zero, e16, m2, tu, ma
        vmax.vv         v16, v16, v20
        add             a2, a2, a3
        vwadd.wv        v24, v24, v16
        bnez            a4, 1b

        vsetvli         zero, zero, e32, m4, ta, ma
        vwredsumu.vs    v0, v24, v0
        vmv.x.s         a0, v0
        ret
endfunc

Rémi Denis-Courmont <remi at remlab.net> 于2024年2月7日周三 00:58写道：

> Hi,
>
> To sum a vector, you should only reduce once at the end of the function,
> c.f.
> how it's done in existing scalar products. Reduction instructions are
> (intrinsically) slow.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
>