[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Sun Aug 19 02:44:16 CEST 2007

On 8/17/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> Hi
>
> On Fri, Aug 17, 2007 at 03:58:16PM +0200, Benjamin Larsson wrote:
> > Michael Niedermayer wrote:
> > > Hi
> > > [...]
> > >
> > > dont fear ill revert any cooley tukey fixed point FFTs which get commited
> > >
> >
> > Ok, then you can start by reverting revision 5.
> > http://svn.mplayerhq.hu/ffmpeg/trunk/libavcodec/ac3enc.c?view=markup&pathrev=5
> >
> > I don't see it as obvious that a split radix fft would be faster on
> > every fixed point platform out there.
>
> well on which is it not?
>
>
> > Would it really hurt to have
> > several fft algos available in the source?
>
> depends on their speed, accuracy, ...
> theres no sense in keeping things which are always inferrior
> keeping them until they have been optimized and benchmarked is all fine
> but we still are at step 0 (that is we dont have an implementation we rather
> disscuss what we could do if we did had one which we dont ...)
>
>
> > Without a radix-2 algo we
> > wouldn't have anything to compare against to back that claim anyway. I'm
> > not really doubting that a split radix fft would be faster in most cases
> > but I'd rather let the benchmark numbers prove the point.
>
> who did suggest that we shouldnt benchmark the code?
> iam just saying that id like to see a split radix fft in lavc, if someone
> wants to write a radix2 one thats fine i wont stop him and ill surely look
> at any benchmarks
>
here are some numbers.
	              256	512	1024  2048	4096
fft32	              46.1	101	205	  494.4	1169.3   fixedpoint 32bit
simple RAD2 fft
fftr2	              7.6	16	34.1	  101.3	258.5     simple rad2 first stage rad4.
fft ffmpeg	7.4	16.4	33.8	  82.4	187.7     current lavc RAD2, first 2
stages, and middle pulled
fftr4/2	       6.7	13.6	28.2	  74	       169.8     rad4/2 with last
stage rad2 if needed.

Still working on split radix maybe middle to end of next month, if I
get the time and figure out how to make that butterfly simple.

here is how the numbers are created timing loop printf should look
familiar to you.

yoda:~/fft mmh$ gcc -O3 fft.c -g -o fft && ./fft
fft.c: In function 'cmp':
fft.c:544: warning: passing argument 1 of 'pvec' from incompatible pointer type
fft.c:545: warning: passing argument 1 of 'pvec' from incompatible pointer type
fft.c: In function 'main':
fft.c:631: warning: passing argument 2 of 'ff_fft_permute' from
incompatible pointer type
fft.c:632: warning: passing argument 2 of 'ff_fft_permute' from
incompatible pointer type
error is 0.000879
4096
FFT32:time: 1169.3 us/transform [total time=1.20 s its=1024]
FFTR2:time: 258.5 us/transform [total time=1.06 s its=4096]
FFT-ffmpeg:time: 187.7 us/transform [total time=1.54 s its=8192]
FFTR4/2:time: 169.8 us/transform [total time=1.39 s its=8192]
yoda:~/fft mmh$

An interesting note is if the code is not compiled with -O3, the RAD4
is a 50% over the current RAD2 in lavc.

Also the RAD4 BFLY needs a new arrangement of twiddles with 25% more factors.

On just an estimate of efficiency of the SplitRad I guess we would see
maybe 2%-5% for something which we should probably leave up to an
architectural specific implementation.

I'm attaching the code for your reading pleasure...

Marc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 1.png
Type: image/png
Size: 9194 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070818/415c1435/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: r4fft.c
Type: application/octet-stream
Size: 4568 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070818/415c1435/attachment.obj>