[FFmpeg-devel] [PATCHv5 1/4] lavc/h263dsp: add DCT dequantisation functions
Rémi Denis-Courmont
remi at remlab.net
Thu Jun 13 22:22:18 EEST 2024
Le torstaina 13. kesäkuuta 2024, 5.23.12 EEST Andreas Rheinhardt a écrit :
> Rémi Denis-Courmont:
> > Note that optimised implementations of these functions will be taken
> > into actual use only if MpegEncContext.dct_unquantize_h263_{inter,intra}
> > are *not* overloaded by existing optimisations.
> >
> > ---
> > This adds the plus ones back, saving two branch instructions in C and
> > one in assembler (at the cost of two unconditional adds).
>
> I don't see how this saves branch instructions:
You can compare versions 4 and 5. The C had the extra if() that you complained
about and the assembler had an explicit extra branch.
That being said, if someone wants to microoptimise the C version of DSP
function, there are much greater and much less debatable savings to be had
elsewhere - for instance adding missing restrict keywords (I don't mean in
this particular case).
The point of this patchset is not to optimise the C code, rather to enable
checkasm and avoid copying the same scalar prologues in RVV and SSE2.
> It seems to me that your intra assembly simply ignores [that] len -1 is
> allowed to be 0.
> If you were to check for the case of nCoeffs == 0 in
> dct_unquantize_h263_intra_c (before the call!), you could write assembly
> with this restriction in mind.
If it really is a common case worth optimising for, then indeed it could be
checked in the common C calling code. But that is not a call that I can
credibly make, and is outside the scope of this MR. If someone has data to
back that optimisation, they are welcome to submit it as a separate patch.
The assembler works fine with 0 length. The only assumption is that the length
is the actual and unsigned length.
> It would also allow the compiler to avoid
> the branch in case it is known that nCoeffs == 63.
--
Rémi Denis-Courmont
http://www.remlab.net/
More information about the ffmpeg-devel
mailing list