[FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
沈佩婷
shenpeiting at eswincomputing.com
Fri Jun 16 13:15:13 EEST 2023
Hei,
> -----原始邮件-----发件人:"Rémi Denis-Courmont" <remi at remlab.net>发送时间:2023-06-16 03:25:07 (星期五)收件人:ffmpeg-devel at ffmpeg.org抄送:"Shen Peiting" <shenpeiting at eswincomputing.com>主题:Re: [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
>
> Le torstaina 15. kesäkuuta 2023, 13.36.42 EEST Peiting Shen a écrit :
> > From: Shen Peiting <shenpeiting at eswincomputing.com>
> >
> > Scalar calculating int32 sum_square optimized by using RVV instructions
> >
> > Benchmarks on Spike(cycles):
> > len=128
> > ac3_sum_square_butterfly_int32_c: 8497
> > ac3_sum_square_butterfly_int32_rvv: 258
> > len=1280
> > ac3_sum_square_butterfly_int32_c: 84529
> > ac3_sum_square_butterfly_int32_rvv: 2274
> >
> > Co-Authored by: Yang Xiaojun <yangxiaojun at eswincomputing.com>
> > Co-Authored by: Huang Xing <huangxing1 at eswincomputing.com>
> > Co-Authored by: Zeng Fanchen <zengfanchen at eswincomputing.com>
> > Signed-off-by: Shen Peiting <shenpeiting at eswincomputing.com>
> > ---
> > libavcodec/riscv/ac3dsp_init.c | 8 +++++
> > libavcodec/riscv/ac3dsp_rvv.S | 53 ++++++++++++++++++++++++++++++++++
> > 2 files changed, 61 insertions(+)
> >
> > diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
> > index a4e75a7541..4fd4abe83e 100644
> > --- a/libavcodec/riscv/ac3dsp_init.c
> > +++ b/libavcodec/riscv/ac3dsp_init.c
> > @@ -26,6 +26,10 @@
> >
> > void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int
> > nb_coefs); void ff_float_to_fixed24_rvv(int32_t *dst, const float *src,
> > unsigned int len); +void ff_ac3_sum_square_butterfly_int32_rvv(int64_t
> > sum[4],
> > + const int32_t *coef0,
> > + const int32_t *coef1,
> > + int len);
> >
> > av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
> > {
> > @@ -35,6 +39,10 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
> > c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
> > c->float_to_fixed24 = ff_float_to_fixed24_rvv;
> > }
> > +#if (__riscv_xlen >= 64)
> > + if (flags & AV_CPU_FLAG_RVV_I64)
> > + c->sum_square_butterfly_int32 =
> > ff_ac3_sum_square_butterfly_int32_rvv; +#endif
> > #endif
> > }
> >
> > diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
> > index d98e72c12c..4e0d238f85 100644
> > --- a/libavcodec/riscv/ac3dsp_rvv.S
> > +++ b/libavcodec/riscv/ac3dsp_rvv.S
> > @@ -63,3 +63,56 @@ func ff_float_to_fixed24_rvv, zve32x
> > bgtz a2, 1b
> > ret
> > endfunc
> > +
> > +
> > +func ff_ac3_sum_square_butterfly_int32_rvv, zve64x
> > + vsetvli t0, a3, e32, m2
> > + vle32.v v0, (a1)
> > + vle32.v v2, (a2)
> > + vadd.vv v4, v0, v2
> > + vsub.vv v6, v0, v2
> > + vwmul.vv v8, v0, v0
> > + vwmul.vv v12, v2, v2
> > + vwmul.vv v16, v4, v4
> > + vwmul.vv v20, v6, v6
> > + sub a3, a3, t0
> > + slli t0, t0, 2
> > + add a1, a1, t0
> > + add a2, a2, t0
> > + beq a3, x0, 2f
> > +1:
> > + vsetvli t0, a3, e32, m2
> > + vle32.v v0, (a1)
> > + vle32.v v2, (a2)
> > + vadd.vv v4, v0, v2
> > + vsub.vv v6, v0, v2
> > + vwmacc.vv v8, v0, v0
> > + vwmacc.vv v12, v2, v2
> > + vwmacc.vv v16, v4, v4
> > + vwmacc.vv v20, v6, v6
> > + sub a3, a3, t0
> > + slli t0, t0, 2
> > + add a1, a1, t0
> > + add a2, a2, t0
> > + bnez a3, 1b
> > +2:
> > + vsetvli t0, x0, e64, m4
> > + vmv.s.x v24, x0
> > + vmv.s.x v25, x0
> > + vmv.s.x v26, x0
> > + vmv.s.x v27, x0
> > + vredsum.vs v24, v8, v24
> > + vredsum.vs v25, v12, v25
> > + vredsum.vs v26, v16, v26
> > + vredsum.vs v27, v20, v27
>
> As far as I can tell this is a reserved encoding (c.f. RVV 1.0 §3.4.2), and I
> believe that QEMU throws an Illegal instruction in this case. (I would check
> but there are no checkasm test case for this function.) Does this actual work
> on your simulator? Because if so, then your simulator is probably broken/
> buggy.
>
RVV 1.0 §14
Vector reduction operations take a vector register group of elements and a scalar held in
element 0 of a vector register, and perform a reduction using some binary operator, to produce
a scalar result in element 0 of a vector register. The scalar input and output operands
are held in element 0 of a single vector register, not a vector register group, so any vector
register can be the scalar source or destination of a vector reduction regardless of LMUL setting.
RVV 1.0 §16.1. Integer Scalar Move Instructions
The integer scalar read/write instructions transfer a single value between a scalar x register and
element 0 of a vector register. The instructions ignore LMUL and vector register groups.
According to the above, I think this coding is legal.
Actually, we have passed all the fate tests on the qemu 6.0.0,compiled riscv-unknown-linux-gnu-gcc 13.0.1, configuration as
./configure --enable-cross-compile --cross-prefix=riscv64-unknown-linux-gnu- --arch=riscv
--extra-cflags="-march=rv64imafdcbv -mabi=lp64d --static -I/home/user/code/iconv/iconv-riscv/include"
--prefix=ffshare --extra-libs="-static -liconv" --extra-ldflags="-L/home/user/code/iconv/iconv-riscv/lib"
--target-os=linux --target-exec="qemu-riscv64 -cpu rv64,x-v=true,x-b=true,x-zpn=true,x-zbpbo=true,x-zpsfoperand=true,x-arith=true"
--enable-gpl --enable-memory-poisoning
We will modify the non-standard coding mentioned in emails, and complete the checkasm code in patch v2
> > + vsetivli t0, 1, e64, m1
> > + vse64.v v24, (a0)
> > + addi a0, a0, 8
> > + vse64.v v25, (a0)
> > + addi a0, a0, 8
> > + vse64.v v26, (a0)
> > + addi a0, a0, 8
> > + vse64.v v27, (a0)
> > + addi a0, a0, 8
> > + ret
> > +endfunc
>
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list