[FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

Fri Mar 8 11:08:01 EET 2024

Le 8 mars 2024 02:45:46 GMT+02:00, flow gg <hlefthleft at gmail.com> a écrit :
>> Isn't it also faster to max LMUL for the adds here?
>
>It requires the use of one more vset, making the time slightly longer:
>147.7 (m1), 148.7 (m8 + vset).

A variation of 0.6% on a single set of kernels will end up below measurement noise in real overall codec usage. And then reducing the I-cache contention can improve performance in other ways. Larger LMUL should also improve performance on bigger cores with more ALUs. So it's not all black and white.

My personal preference is to keep the code small if it makes almost no difference but I'm not BDFL.

>Also this might not be much noticeable on C908, but avoiding sequential
>dependencies on the address registers may help. I mean, avoid using as
>address
>operand a value that was calculated by the immediate previous instruction.
>
>> Okay, but the test results haven't changed..
>It would add more than ten lines of code, perhaps shorter code will better?

I don't know. There are definitely in-order vector cores coming, and data dependencies will hurt them. But I don't know if anyone will care about FFmpeg on those.