[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs
flow gg
hlefthleft at gmail.com
Wed Feb 7 02:01:23 EET 2024
I think in most cases it is like this, but specifically for this function,
using Reduction only once would be slower.
The currently submitted version roughly takes:
pix_abs_0_0_rvv_i32: 136.2
The version that uses Reduction only once takes:
pix_abs_0_0_rvv_i32: 169.2
Here is the implementation of the version that uses it only once:
func ff_pix_abs16_temp_rvv, zve32x
vsetivli zero, 16, e32, m4, ta, ma
vmv.v.i v24, 0
vmv.s.x v0, zero
1:
vsetvli zero, zero, e8, m1, tu, ma
vle8.v v4, (a1)
vle8.v v12, (a2)
addi a4, a4, -1
vwsubu.vv v16, v4, v12
add a1, a1, a3
vwsubu.vv v20, v12, v4
vsetvli zero, zero, e16, m2, tu, ma
vmax.vv v16, v16, v20
add a2, a2, a3
vwadd.wv v24, v24, v16
bnez a4, 1b
vsetvli zero, zero, e32, m4, ta, ma
vwredsumu.vs v0, v24, v0
vmv.x.s a0, v0
ret
endfunc
Rémi Denis-Courmont <remi at remlab.net> 于2024年2月7日周三 00:58写道:
> Hi,
>
> To sum a vector, you should only reduce once at the end of the function,
> c.f.
> how it's done in existing scalar products. Reduction instructions are
> (intrinsically) slow.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
>
More information about the ffmpeg-devel
mailing list