[FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

Rémi Denis-Courmont remi at remlab.net
Sun Jan 7 10:03:00 EET 2024


Le sunnuntaina 7. tammikuuta 2024, 3.33.39 EET flow gg a écrit :
> I tested it, and indeed using vwsub is faster. Updated it in the reply.
> 
> ---
> 
> I have a question: if I tweak the load order a bit, using one less vset, it
> leads to being slower (the patch I submitted is 13.2, if I make the
> following change, the time would be 15.2).
> But I thought it would be faster.

I would guess that v0 is needed before v8 in the internal implementation of 
vwsub. This kind of makes sense as the element still need to be sign-extended. 
Thus vwsub ends up stalling the pipeline in wait for vle8 to complete. That's 
just a guess though, as I don't have internal cycle timing documentation.

> - vsetvli      t0, a2, e8, m2, tu, ma
> - vle8.v       v0, (a0)
> - sub          a2, a2, t0
> - vsetvli      zero, t0, e16, m4, tu, ma
> - vle16.v      v8, (a1)
> - vsetvli      zero, t0, e8, m2, tu, ma
> - vwsub.wv     v16, v8, v0
> 
> + vsetvli      t0, a2, e16, m4, tu, ma
> + vle16.v      v8, (a1)
> + sub          a2, a2, t0
> + vsetvli      zero, t0, e8, m2, tu, ma
> + vle8.v       v0, (a0)
> + vwsub.wv     v16, v8, v0

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/





More information about the ffmpeg-devel mailing list