[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization
Marc Hoffman
mmh
Thu Aug 23 04:30:49 CEST 2007
On Aug 22, 2007, at 10:03 PM, Michael Niedermayer wrote:
> Hi
>
> On Wed, Aug 22, 2007 at 08:31:12PM -0400, Marc Hoffman wrote:
>> On 8/21/07, Michael Niedermayer <michaelni at gmx.at> wrote:
>>> Hi
>>>
>>> On Tue, Aug 21, 2007 at 05:02:18PM -0400, Marc Hoffman wrote:
>>>> Hi,
>>>>
>>>> On 8/19/07, Justin Ruggles <justinruggles at bellsouth.net> wrote:
>>>>>
>>>>> 256 512 1024 2048 4096
>>>>> -------------------------------------------------------
>>>>> (1) fft32 46.1 101.0 205.0 494.4 1169.3
>>>>> (2) fftr2 7.6 16.0 34.1 101.3 258.5
>>>>> (3) fft ffmpeg 7.4 16.4 33.8 82.4 187.7
>>>>> (4) fftr4/2 6.7 13.6 28.2 74.0 169.8
>>>>>
>>>>> (1) fixedpoint 32bit simple RAD2 fft
>>>>> (2) simple rad2 first stage rad4
>>>>> (3) current lavc RAD2, first 2 stages, and middle pulled
>>>>> (4) rad4/2 with last stage rad2 if needed
>>>>
>>>> I'm adding a split radix to the list. Its a lot harder to optimize
>>>> because of the extra pointer manipulation and control
>>>> structures. I
>>>> did manage to unroll the W0 multipliers out but getting a clean
>>>> iteration space has not happened yet.
>>>
>>> could you also add the split radix fft from
>>> liba52-cvs/a52dec/liba52/imdct.c
>>> as comparission
>>>
>>> also djbfft and fftw2 would be interresting
>>> IIRC they all use split radix ffts
>>>
>>> and looking at them might help you with optimizing your code
>>
>> Your right if we do those techniques we would have to implement a
>> different optimized FFT for every possible data size we need. Unless
>> I'm missing something please advise.
>
> each of these implementations needs exactly 4 lines of code (not
> counting {}
> in liba52)
>
> now all the ffts are n^2 based so if we assume that the largest we
> want
> to support is 1mb then we would need 80 lines of code max
>
> and if you want to support all sizes up to 2^64 thats still just
> 256 lines
> and if you use a macro well its just 64+4 lines then
>
> that said iam not saying this aproach should be taken but if you
> cant find
> a faster and simpler solution then it might be a good choice
Michael,
This is a waste of our time, my judgement is to keep this structure
regular and simple. If we make a mess with macros its just difficult
for us to debug and maintain. I suggest we use the RAD4/RAD2 fft we
just developed its a good improvement over the existing code and much
more regular. Our bigger issue is to hook the fft through
DSPContext, and not struggle with the high level FFT we use IMHO.
The thing I have a much better appreciation for thanks to you, is how
much extra overhead a fixedpoint FFT has over a floating point one on
a floating point machine. All the normalization code is just
expensive, do you have a good idea of how we might move in the
direction of fixedpoint but still maintain floating point performance
on floating point machines?
Marc
More information about the ffmpeg-devel
mailing list