[FFmpeg-devel] MMX accelerated DSP functions for VC1/WMV3 decoders

Sat Jun 30 20:35:17 CEST 2007

Hi,

Michael Niedermayer a ?crit :
>> +#if defined(CONFIG_VC1_DECODER) || defined(CONFIG_WMV3_DECODER)
>> +extern void ff_vc1dsp_init_mmx(DSPContext* dsp, AVCodecContext *avctx);
>> +#endif
>> +
> 
> the #if is unneeded

Indeed, even if defined, the symbol won't be used when those conditions
are not met.

> [...]
>> +     "psllw     $1, %%mm1               \n\t"                   \
>> +     "psllw     $1, %%mm2               \n\t"                   \
> 
> paddw

Is that always faster?

> duplicating each filter 4 times with macros is unacceptable
> the overhead for 2 calls is not that big

OK. If I understand right the plan, you want instead 4 functions to be
created, one per shift position. If we say that {1,2,3}/4 shift code
sizes are N (and neglect the no-shift code size), we currently have a
total code size of:
3*3*(N+N) + 2*3*(N+epsilon) + epsilon? ~ 24N
Your plan is to get it to 3*N + epsilon ~ 3N

However, with this, the same function is used for vertical and
horizontal filtering. The tap offsets is no longer known at compilation,
hence we have a more complex addressing pattern (of the [eax+ecx+N]
kind) and a register less. And I probably have to rewrite part of the
macros.

What would you say about having 1 vertical function and 1 horizontal for
{1,2,3}/4 shift positions? This should double the code size compared to
your plan, but looks much simpler to me.

Best regards,
Christophe GISQUET