[FFmpeg-devel] [RFC/RFBench] AVX FFT

Michael Niedermayer michaelni at gmx.at
Fri Apr 1 20:16:39 CEST 2011


On Fri, Apr 01, 2011 at 07:12:47PM +0200, Vitor Sessak wrote:
> Hi,
>
> The following patches add an AVX (an intel x86 extension) FFT  
> implementation. Since I do not have a Sandybridge myself, I have no idea  
> of its performance. Benchmarks (for ex., using fft-test -s) are thus  
> very welcome. Also welcome are suggestions for optimizing it further, in  
> particular the 8 point FFT (in the T8_AVX macro), which is not much  
> faster than the SSE version.
>
> One thing noteworthy about AVX is that it uses 256 bits registers, so  
> now av_malloc needs to align the pointers to 32-byte boundaries. If this  
> patch is accepted, I'll have to change a bunch of audio decoders to  
> increase their buffers' alignment (note that AVX does not crash if a  
> 256-bit load is done on a 128-bit aligned pointer, but it will cause a  
> cache miss and thus a performance hit).
>
> -Vitor
>
> PS: cross-posted to both lists since I'm interested in feedback from  
> both groups.

Note, i dont know AVX (yet) and dont have a CPU that supports it
review below is thus a bit lame. The code looks largels ok though
for someone not having had time to look at the datasheets


[...]
> --- a/libavcodec/x86/fft_mmx.asm
> +++ b/libavcodec/x86/fft_mmx.asm
> @@ -1,6 +1,7 @@
>  ;******************************************************************************
>  ;* FFT transform with SSE/3DNow optimizations
>  ;* Copyright (c) 2008 Loren Merritt
> +;* AVX ASM Copyright (c) 2011 Vitor Sessak
>  ;*
>  ;* This algorithm (though not any of the implementation details) is
>  ;* based on libdjbfft by D. J. Bernstein.
> @@ -49,11 +50,22 @@ endstruc
>  SECTION_RODATA
>  
>  %define M_SQRT1_2 0.70710678118654752440
> -ps_root2: times 4 dd M_SQRT1_2
> -ps_root2mppm: dd -M_SQRT1_2, M_SQRT1_2, M_SQRT1_2, -M_SQRT1_2
> -ps_p1p1m1p1: dd 0, 0, 1<<31, 0
> +%define M_COS_PI_1_8 0.923879532511287
> +%define M_COS_PI_3_8 0.38268343236509
> +
> +ps_cos16_1: dd 1.0, M_COS_PI_1_8, M_SQRT1_2, M_COS_PI_3_8, 1.0, M_COS_PI_1_8, M_SQRT1_2, M_COS_PI_3_8
> +ps_cos16_2: dd 0, M_COS_PI_3_8, M_SQRT1_2, M_COS_PI_1_8, 0, -M_COS_PI_3_8, -M_SQRT1_2, -M_COS_PI_1_8
> +
> +ps_root2: times 8 dd M_SQRT1_2
> +ps_root2mppm: dd -M_SQRT1_2, M_SQRT1_2, M_SQRT1_2, -M_SQRT1_2, -M_SQRT1_2, M_SQRT1_2, M_SQRT1_2, -M_SQRT1_2
> +ps_p1p1m1p1: dd 0, 0, 1<<31, 0, 0, 0, 1<<31, 0
>  ps_m1p1: dd 1<<31, 0
>

> +perm1: dd 0x00, 0x02, 0x03, 0x01, 0x03, 0x00, 0X02, 0x01
> +perm2: dd 0x00, 0x01, 0x02, 0x03, 0x01, 0x00, 0X02, 0x03
                                                  ^
upper case


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Breaking DRM is a little like attempting to break through a door even
though the window is wide open and the only thing in the house is a bunch
of things you dont want and which you would get tomorrow for free anyway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20110401/37c61933/attachment.asc>


More information about the ffmpeg-devel mailing list