[FFmpeg-devel] MMX accelerated DSP functions for VC1/WMV3 decoders
Christophe GISQUET
christophe.gisquet
Sat Jun 30 20:35:17 CEST 2007
Hi,
Michael Niedermayer a ?crit :
>> +#if defined(CONFIG_VC1_DECODER) || defined(CONFIG_WMV3_DECODER)
>> +extern void ff_vc1dsp_init_mmx(DSPContext* dsp, AVCodecContext *avctx);
>> +#endif
>> +
>
> the #if is unneeded
Indeed, even if defined, the symbol won't be used when those conditions
are not met.
> [...]
>> + "psllw $1, %%mm1 \n\t" \
>> + "psllw $1, %%mm2 \n\t" \
>
> paddw
Is that always faster?
> duplicating each filter 4 times with macros is unacceptable
> the overhead for 2 calls is not that big
OK. If I understand right the plan, you want instead 4 functions to be
created, one per shift position. If we say that {1,2,3}/4 shift code
sizes are N (and neglect the no-shift code size), we currently have a
total code size of:
3*3*(N+N) + 2*3*(N+epsilon) + epsilon? ~ 24N
Your plan is to get it to 3*N + epsilon ~ 3N
However, with this, the same function is used for vertical and
horizontal filtering. The tap offsets is no longer known at compilation,
hence we have a more complex addressing pattern (of the [eax+ecx+N]
kind) and a register less. And I probably have to rewrite part of the
macros.
What would you say about having 1 vertical function and 1 horizontal for
{1,2,3}/4 shift positions? This should double the code size compared to
your plan, but looks much simpler to me.
Best regards,
Christophe GISQUET
More information about the ffmpeg-devel
mailing list