[FFmpeg-devel] [PATCH] Move MLP's dot product to DSPContext
Ramiro Polla
ramiro.polla
Tue Apr 21 04:01:10 CEST 2009
On Mon, Apr 20, 2009 at 9:40 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Apr 20, 2009 at 02:29:09AM -0300, Ramiro Polla wrote:
>> On Mon, Apr 20, 2009 at 12:14 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Sun, Apr 19, 2009 at 10:10:05PM -0300, Ramiro Polla wrote:
>> >> Attached file move MLP's dot product to DSPContext. The filter order
>> >> is a maximum of 8, and in the rematrix stage it's a maximum of 5+2
>> >> channels for MLP and 7+0 channels for TrueHD, so it all fits in 8
>> >> (hopefully) optimized functions.
>> >
>> > the functions are too small, the call overhead is too much
>> > 1-8 multiplicatons and 1-8 additions is not enough ...
>>
>> I thought that would happen too, but strangely there was a speedup.
>
> you wrote the whole function in asm() and that was slower?
Attached are three asm variants: sse2, sse4, and altivec.
Here are the benchmarks:
- on x86
current: 3700ms
array of functions in dspcontext:
c : 3300ms
sse2 : 3400ms
sse4 : 3200ms
inlined in mlpdec.c:
c : 3500ms
sse2 : 3200ms
sse4 : 3100ms
- on x86_64 (can't run sse4)
current: 2070ms
array of functions in dspcontext:
c : 2600ms (badly vectorized)
c : 1920ms (not vectorized)
sse2 : 2450ms
inlined in mlpdec.c:
c : 2800ms (badly vectorized)
c : 1980ms (not vectorized)
sse2 : 2450ms
- on ppc
current: 9800ms
array of functions in dspcontext:
c : 10800ms
altivec: 10000ms
inlined in mlpdec.c:
c : 9400ms
altivec: 8800ms
More information about the ffmpeg-devel
mailing list