[FFmpeg-devel] [libav-devel] [RFC/RFBench] AVX FFT
Reinhard Tartler
siretart at tauware.de
Fri Apr 1 22:38:06 CEST 2011
On Fri, Apr 01, 2011 at 19:12:47 (CEST), Vitor Sessak wrote:
> Hi,
>
> The following patches add an AVX (an intel x86 extension) FFT
> implementation. Since I do not have a Sandybridge myself, I have no idea
> of its performance. Benchmarks (for ex., using fft-test -s) are thus
> very welcome. Also welcome are suggestions for optimizing it further, in
> particular the 8 point FFT (in the T8_AVX macro), which is not much
> faster than the SSE version.
>
> One thing noteworthy about AVX is that it uses 256 bits registers, so
> now av_malloc needs to align the pointers to 32-byte boundaries. If this
> patch is accepted, I'll have to change a bunch of audio decoders to
> increase their buffers' alignment (note that AVX does not crash if a
> 256-bit load is done on a 128-bit aligned pointer, but it will cause a
> cache miss and thus a performance hit).
>> master/libavcodec/fft-test -s
FFT 512 test
Checking...
max:0.000008 e:3.92148e-08
Speed test...
time: 1.4 us/transform [total time=1.51 s its=1048576]
>> avx/libavcodec/fft-test -s
FFT 512 test
Checking...
zsh: segmentation fault (core dumped) avx/libavcodec/fft-test -s
>> gdb avx/libavcodec/fft-test
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/siretart/libav/libav/avx/libavcodec/fft-test...done.
(gdb) run -s
Starting program: /home/siretart/libav/libav/avx/libavcodec/fft-test -s
[Thread debugging using libthread_db enabled]
FFT 512 test
Checking...
Program received signal SIGSEGV, Segmentation fault.
fft32_avx () at /home/siretart/libav/libav/libavcodec/x86/fft_mmx.asm:568
568 PASS_SMALL_AVX 0, [cos_32], [cos_32+32]
(gdb) bt full
#0 fft32_avx () at /home/siretart/libav/libav/libavcodec/x86/fft_mmx.asm:568
No locals.
#1 0x0000000000407405 in fft64_avx () at /home/siretart/libav/libav/libavcodec/x86/fft_mmx.asm:796
No locals.
#2 0x0000000000407445 in fft128_avx () at /home/siretart/libav/libav/libavcodec/x86/fft_mmx.asm:796
No locals.
#3 0x0000000000407485 in fft256_avx () at /home/siretart/libav/libav/libavcodec/x86/fft_mmx.asm:796
No locals.
#4 0x0000000000407835 in fft512_interleave_avx () at /home/siretart/libav/libav/libavcodec/x86/fft_mmx.asm:797
No locals.
#5 0x0000000000407aaf in ff_fft_dispatch_interleave_avx ()
at /home/siretart/libav/libav/libavcodec/x86/fft_mmx.asm:797
No locals.
#6 0x0000000000401b4b in main (argc=<value optimized out>, argv=<value optimized out>)
at /home/siretart/libav/libav/libavcodec/fft-test.c:368
tab = <value optimized out>
tab1 = <value optimized out>
tab_ref = 0x690100
tab2 = <value optimized out>
it = <value optimized out>
i = <value optimized out>
c = <value optimized out>
do_speed = 1
err = 1
transform = TRANSFORM_FFT
do_inverse = <value optimized out>
s1 = {nbits = 9, inverse = 0, revtab = 0x6919c0, tmp_buf = 0x691e20, mdct_size = 0, mdct_bits = 0,
tcos = 0xca0000, tsin = 0x7ffff7625849, fft_permute = 0x409290 <ff_fft_permute_sse>,
fft_calc = 0x409190 <ff_fft_calc_avx>, imdct_calc = 0x409300 <ff_imdct_calc_sse>,
imdct_half = 0x409010 <ff_imdct_half_sse>, mdct_calc = 0x4059f0 <ff_mdct_calc_c>, fft_permutation = 1,
mdct_permutation = 0}
s = 0x7fffffffe600
m1 = {nbits = 53, inverse = 0, revtab = 0x7ffff7625640, tmp_buf = 0x7ffff793f990, mdct_size = 202,
mdct_bits = 0, tcos = 0x6, tsin = 0xbf, fft_permute = 0, fft_calc = 0x7ffff7ffe640,
imdct_calc = 0x100000001, imdct_half = 0x7ffff76256de, mdct_calc = 0x7ffff7ffd0ca <_rtld_global+138>,
fft_permutation = 4196022, mdct_permutation = 0}
m = 0x7fffffffe5a0
r1 = {nbits = 4195224, inverse = 0, sign_convention = 0, tcos = 0x1000007ff, tsin = 0x0, fft = {nbits = 0,
inverse = 0, revtab = 0x7ffff7ffe640, tmp_buf = 0x7fffffffe6c0, mdct_size = -134330936,
mdct_bits = 32767, tcos = 0x7fffffffe6e0, tsin = 0x7ffff7ffe2e8, fft_permute = 0xf63d4e2e,
fft_calc = 0x7ffff7de5fec, imdct_calc = 0, imdct_half = 0x7ffff7fe45c8, mdct_calc = 0x1,
fft_permutation = 0, mdct_permutation = 0}, rdft_calc = 0x1}
r = 0x7fffffffe510
d1 = {nbits = -163754450, inverse = 0, rdft = {nbits = -136422069, inverse = 32767,
sign_convention = -134229976, tcos = 0x3d8f538, tsin = 0x7fff0000002e, fft = {nbits = -6528,
inverse = 32767, revtab = 0x7ffff7597a6c, tmp_buf = 0x7ffff7597c60, mdct_size = -143700113,
mdct_bits = 32767, tcos = 0x7ffff75a3c48, tsin = 0x7fffffffe680, fft_permute = 0, fft_calc = 0x2,
imdct_calc = 0x7ffff7ddc608, imdct_half = 0x7ffff7ffe640, mdct_calc = 0x7ffff7ffec00,
fft_permutation = -134225176, mdct_permutation = 32767}, rdft_calc = 0}, costab = 0x7ffff7fe45c8,
csc2 = 0x7ffff7fe4000, dct_calc = 0x4006b6, dct32 = 0x7ffff75a4800}
d = 0x7fffffffe460
fft_nbits = 9
fft_size = 512
fft_size_2 = <value optimized out>
scale = 1
prng = {state = {3250495564, 2306601950, 2471027902, 2803700605, 3244271300, 828295292, 896232666,
2003252624, 2630297126, 278281651, 703771934, 2713658379, 185627147, 1845869981, 65037591, 515167334,
950953966, 2495981163, 869539144, 1017573762, 2187959630, 448742437, 3573286387, 772737749, 4086013702,
2723783272, 90637724, 1999990057, 1509936804, 647874073, 820414807, 198651198, 549524212, 3090657751,
1069157946, 3638807939, 2857319369, 249426330, 1341290762, 912573389, 2983032505, 1578888493, 186775408,
2963241266, 620391839, 581600772, 3500425447, 3511157190, 3509245070, 3930782797, 2430430146,
3101723090, 1456695348, 597768273, 2841785752, 3449146762, 2856126162, 1266718357, 3872858551,
2588111943, 3685614661, 1145658996, 3344543386, 3542870515}, index = 1024}
(gdb)
--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4
More information about the ffmpeg-devel
mailing list