[FFmpeg-devel] [PATCH] lavc/lpc: R-V V apply_welch_window
Anton Khirnov
anton at khirnov.net
Mon Dec 11 11:11:28 EET 2023
Quoting Rémi Denis-Courmont (2023-12-08 18:46:51)
> +#if __riscv_xlen >= 64
> +func ff_lpc_apply_welch_window_rvv, zve64d
> + vsetvli t0, zero, e64, m8, ta, ma
> + vid.v v0
> + addi t2, a1, -1
> + vfcvt.f.xu.v v0, v0
> + li t3, 2
> + fcvt.d.l ft2, t2
> + srai t1, a1, 1
> + fcvt.d.l ft3, t3
> + li t4, 1
> + fdiv.d ft0, ft3, ft2 # ft0 = c = 2. / (len - 1)
> + fcvt.d.l fa1, t4 # fa1 = 1.
> + fsub.d ft1, ft0, fa1
> + vfrsub.vf v0, v0, ft1 # v0[i] = c - i - 1.
> +1:
> + vsetvli t0, t1, e64, m8, ta, ma
> + vfmul.vv v16, v0, v0 # no fused multipy-add as v0 is reused
> + sub t1, t1, t0
> + vle32.v v8, (a0)
> + fcvt.d.l ft2, t0
> + vfrsub.vf v16, v16, fa1 # v16 = 1. - w * w
> + sh2add a0, t0, a0
> + vsetvli zero, zero, e32, m4, ta, ma
> + vfwcvt.f.x.v v24, v8
> + vsetvli zero, zero, e64, m8, ta, ma
> + vfsub.vf v0, v0, ft2 # v0 -= vl
> + vfmul.vv v8, v24, v16
> + vse64.v v8, (a2)
> + sh3add a2, t0, a2
> + bnez t1, 1b
> +
> + andi t1, a1, 1
> + beqz t1, 2f
> +
> + sd zero, (a2)
> + addi a0, a0, 4
> + addi a2, a2, 8
> +2:
> + vsetvli t0, zero, e64, m8, ta, ma
> + vid.v v0
> + srai t1, a1, 1
> + vfcvt.f.xu.v v0, v0
> + fcvt.d.l ft1, t1
> + fsub.d ft1, ft0, ft1 # ft1 = c - (len / 2)
> + vfadd.vf v0, v0, ft1 # v0[i] = c - (len / 2) + i
> +3:
> + vsetvli t0, t1, e64, m8, ta, ma
> + vfmul.vv v16, v0, v0
> + sub t1, t1, t0
> + vle32.v v8, (a0)
> + fcvt.d.l ft2, t0
> + vfrsub.vf v16, v16, fa1 # v16 = 1. - w * w
> + sh2add a0, t0, a0
> + vsetvli zero, zero, e32, m4, ta, ma
> + vfwcvt.f.x.v v24, v8
> + vsetvli zero, zero, e64, m8, ta, ma
> + vfadd.vf v0, v0, ft2 # v0 += vl
> + vfmul.vv v8, v24, v16
> + vse64.v v8, (a2)
> + sh3add a2, t0, a2
> + bnez t1, 3b
I think it'd look a lot less like base64 < /dev/random if you vertically
aligned the first operands.
--
Anton Khirnov
More information about the ffmpeg-devel
mailing list