[FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add
Rémi Denis-Courmont
remi at remlab.net
Tue Sep 26 21:50:53 EEST 2023
Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> benchmark:
> fcmul_add_c: 19.7
> fcmul_add_rvv_f32: 6.7
+ li t1, 4
+ vsetvli t0, t1, e32, m1, ta, ma
vsetivli t0, 4, ...
But really, DO NOT use a fixed vector length here. At best, you're wasting half
the vector width. Your input has a variable size, use it.
+
+ li t2, 8
+
+ vlsseg2e32.v v0, (a1), t2
I'm not sure what you are trying to achieve here. It seems that you could just
as well use vlseg2 without register stride, no?
+ vlsseg2e32.v v2, (a2), t2
+ vlsseg2e32.v v4, (a0), t2
+
+ vfmul.vv v6, v0, v2
+ vfmul.vv v7, v1, v3
+ vfmul.vv v8, v0, v3
+ vfmul.vv v9, v1, v2
+
+ vfadd.vv v4, v4, v6
+ vfsub.vv v4, v4, v7
+ vfadd.vv v5, v5, v8
+ vfadd.vv v5, v5, v9
+
+ vssseg2e32.v v4, (a0), t2
Same here.
--
レミ・デニ-クールモン
http://www.remlab.net/
More information about the ffmpeg-devel
mailing list