[FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

Wed Sep 27 19:27:29 EEST 2023

Le keskiviikkona 27. syyskuuta 2023, 4.47.26 EEST flow gg a écrit :
> >>> please pad mnemonics to at least 8 columns for consistency
> 
> okay, changed
> 
> >>> It seems that you could just as well use vlseg2 without register
> 
> stride, no?
> 
> yes, vlseg will better, changed
> 
> >>> Note that you could do the double versions with very little extra
> 
> efforts.
> 
> okay
> 
> >>> But really, DO NOT use a fixed vector length here. At best, you're
> 
> wasting half
> 
> >>> the vector width. Your input has a variable size, use it.
> 
> okay, changed
> 
> >>> I'm a bit surprised that the performance improves this much,
> 
> considering that
> 
> >>> the C910 is notoriously bad at both segmented strided loads. It might
> 
> be that
> 
> >>> the C versions is just very bad due to lack of aliasing optimisations.
> 
> thanks, You reminded me.
> Sorry I had forgotten that there was a problem..
> A few days ago, I wanted to try running some existing benchmarks,
> 
> ```
> tests/checkasm/checkasm --bench --test=aacpsdsp
> tests/checkasm/checkasm --bench --test=alacdsp
> tests/checkasm/checkasm --bench --test=audiodsp
> tests/checkasm/checkasm --bench --test=g722dsp
> tests/checkasm/checkasm --bench --test=vorbisdsp
> tests/checkasm/checkasm --bench --test=float_dsp
> tests/checkasm/checkasm --bench --test=fixed_dsp
> tests/checkasm/checkasm --bench --test=af_afir
> ```
> 
> but they all returned 0.0.
> 
> For example,
> 
> ```
> butterflies_float_c: 0.0
> butterflies_float_rvv_f32: 0.0
> scalarproduct_float_c: 0.0
> scalarproduct_float_rvv_f32: 0.0
> vector_dmac_scalar_c: 0.0
> vector_dmac_scalar_rvv_f64: 0.0
> ...

OK, this reproduces on both SiFive and T-Head hardware here. You need to 
revert 09731fbfc3a914ec4f6ffad60aa9062db6a8f6aa.

-- 
レミ・デニ-クールモン
http://www.remlab.net/