[FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add
Rémi Denis-Courmont
remi at remlab.net
Wed Sep 27 19:27:29 EEST 2023
Le keskiviikkona 27. syyskuuta 2023, 4.47.26 EEST flow gg a écrit :
> >>> please pad mnemonics to at least 8 columns for consistency
>
> okay, changed
>
> >>> It seems that you could just as well use vlseg2 without register
>
> stride, no?
>
> yes, vlseg will better, changed
>
> >>> Note that you could do the double versions with very little extra
>
> efforts.
>
> okay
>
> >>> But really, DO NOT use a fixed vector length here. At best, you're
>
> wasting half
>
> >>> the vector width. Your input has a variable size, use it.
>
> okay, changed
>
> >>> I'm a bit surprised that the performance improves this much,
>
> considering that
>
> >>> the C910 is notoriously bad at both segmented strided loads. It might
>
> be that
>
> >>> the C versions is just very bad due to lack of aliasing optimisations.
>
> thanks, You reminded me.
> Sorry I had forgotten that there was a problem..
> A few days ago, I wanted to try running some existing benchmarks,
>
> ```
> tests/checkasm/checkasm --bench --test=aacpsdsp
> tests/checkasm/checkasm --bench --test=alacdsp
> tests/checkasm/checkasm --bench --test=audiodsp
> tests/checkasm/checkasm --bench --test=g722dsp
> tests/checkasm/checkasm --bench --test=vorbisdsp
> tests/checkasm/checkasm --bench --test=float_dsp
> tests/checkasm/checkasm --bench --test=fixed_dsp
> tests/checkasm/checkasm --bench --test=af_afir
> ```
>
> but they all returned 0.0.
>
> For example,
>
> ```
> butterflies_float_c: 0.0
> butterflies_float_rvv_f32: 0.0
> scalarproduct_float_c: 0.0
> scalarproduct_float_rvv_f32: 0.0
> vector_dmac_scalar_c: 0.0
> vector_dmac_scalar_rvv_f64: 0.0
> ...
OK, this reproduces on both SiFive and T-Head hardware here. You need to
revert 09731fbfc3a914ec4f6ffad60aa9062db6a8f6aa.
--
レミ・デニ-クールモン
http://www.remlab.net/
More information about the ffmpeg-devel
mailing list