[FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

flow gg hlefthleft at gmail.com
Fri Dec 22 03:34:39 EET 2023


func ff_decorrelate_sm_rvv, zve32x
1:
        vsetvli  t0, a2, e32, m8, ta, ma
        vle32.v  v8, (a1)
        sub a2,  a2, t0
        vle32.v  v0, (a0)
        vssra.vi  v8, v8, 1
        vsub.vv  v16, v0, v8
        vse32.v  v16, (a0)
        sh2add   a0, t0, a0
        vadd.vv  v16, v0, v8
        vse32.v  v16, (a1)
        sh2add   a1, t0, a1
        bnez     a2, 1b
        ret
endfunc

Is this way? In this situation, or when using vsra, there will be some
tests that fail, and the result value differs by 1. I'm not sure where the
problem..

Rémi Denis-Courmont <remi at remlab.net> 于2023年12月22日周五 00:08写道:

> Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit :
> > C908:
> > decorrelate_sm_c: 130.0
> > decorrelate_sm_rvv_i32: 43.7
>
> +
> +func ff_decorrelate_sm_rvv, zve32x
> +1:
> +        vsetvli  t0, a2, e32, m8, ta, ma
> +        vle32.v  v0, (a0)
> +        sub a2,  a2, t0
> +        vle32.v  v8, (a1)
> +        vsra.vi  v16, v8, 1
>
> You should load v8 first, since it is used as input before v0.
>
> +        vsub.vv  v0, v0, v16
> +        vse32.v  v0, (a0)
> +        sh2add   a0, t0, a0
> +        vadd.vv  v0, v0, v8
>
> You can use VSSRA, and then VADD won't need to depend on the output of
> VSUB.
>
> +        vse32.v  v0, (a1)
> +        sh2add   a1, t0, a1
> +        bnez     a2, 1b
> +        ret
> +endfunc
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>


More information about the ffmpeg-devel mailing list