[FFmpeg-devel] [PATCH + RFC] Faster ff_celp_lp_synthesis_filterf() (and failed SSE SIMD version)
Vitor Sessak
vitor1001
Wed Dec 16 18:10:39 CET 2009
Michael Niedermayer wrote:
> On Mon, Dec 14, 2009 at 10:21:47PM +0100, Vitor Sessak wrote:
>> Vitor Sessak wrote:
>>> Michael Niedermayer wrote:
>>>> On Sun, Dec 13, 2009 at 08:55:08PM +0100, Vitor Sessak wrote:
>>>> [...]
>>>>> + old_out3 = old_out2;
>>>>> + old_out2 = old_out1;
>>>>> + old_out1 = old_out0;
>>>>> + old_out0 = out[-i-1];
>>>>> +
>>>>> + val = filter_coeffs[i];
>>>>> +
>>>>> + out0 -= val * old_out0;
>>>>> + out1 -= val * old_out1;
>>>>> + out2 -= val * old_out2;
>>>>> + out3 -= val * old_out3;
>>>> old_out3 = out[-i-1];
>>>>
>>>> val = filter_coeffs[i];
>>>> out0 -= val * old_out3;
>>>> out1 -= val * old_out0;
>>>> out2 -= val * old_out1;
>>>> out3 -= val * old_out2;
>>>>
>>>> and similarly you can get rid of the other copies if you unroll it more
>>> Indeed, done. New patch attached.
>>> BTW, in my SSE code, there was a line of code missing:
>>>> DECLARE_ASM_CONST(16, uint32_t, mask[4]) = {0xFFFFFFFF, 0xFFFFFFFF,
>>>> 0xFFFFFFFF, 0x00000000};
>>>>
>> Err, this time without reinventing FFSWAP()...
>>
>> -Vitor
>
> do you want to be maintainer of celp_filters*
Yes, done and patch committed.
-Vitor
More information about the ffmpeg-devel
mailing list