[FFmpeg-devel] [PATCH] Move MLP's dot product to DSPContext

Tue Apr 21 04:01:10 CEST 2009

On Mon, Apr 20, 2009 at 9:40 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Apr 20, 2009 at 02:29:09AM -0300, Ramiro Polla wrote:
>> On Mon, Apr 20, 2009 at 12:14 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Sun, Apr 19, 2009 at 10:10:05PM -0300, Ramiro Polla wrote:
>> >> Attached file move MLP's dot product to DSPContext. The filter order
>> >> is a maximum of 8, and in the rematrix stage it's a maximum of 5+2
>> >> channels for MLP and 7+0 channels for TrueHD, so it all fits in 8
>> >> (hopefully) optimized functions.
>> >
>> > the functions are too small, the call overhead is too much
>> > 1-8 multiplicatons and 1-8 additions is not enough ...
>>
>> I thought that would happen too, but strangely there was a speedup.
>
> you wrote the whole function in asm() and that was slower?

Attached are three asm variants: sse2, sse4, and altivec.

Here are the benchmarks:

- on x86
current:  3700ms
array of functions in dspcontext:
c      :  3300ms
sse2   :  3400ms
sse4   :  3200ms
inlined in mlpdec.c:
c      :  3500ms
sse2   :  3200ms
sse4   :  3100ms

- on x86_64 (can't run sse4)
current:  2070ms
array of functions in dspcontext:
c      :  2600ms (badly vectorized)
c      :  1920ms (not vectorized)
sse2   :  2450ms
inlined in mlpdec.c:
c      :  2800ms (badly vectorized)
c      :  1980ms (not vectorized)
sse2   :  2450ms

- on ppc
current:  9800ms
array of functions in dspcontext:
c      : 10800ms
altivec: 10000ms
inlined in mlpdec.c:
c      :  9400ms
altivec:  8800ms