[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2

Thu Aug 24 20:25:06 CEST 2006

On Thu, Aug 24, 2006 at 10:50:41AM -0700, Loren Merritt wrote:
> On Thu, 24 Aug 2006, Luca Barbato wrote:
> 
> >Loren Merritt wrote:
> >>On Thu, 24 Aug 2006, Luca Barbato wrote:
> >>
> >>>Zuxy Meng wrote:
> >>>
> >>>>+    n = 1 << s->nbits;
> >>>>+    n8 = n >> 3;
> >>>[...]
> >>>>+    z += n8;
> >>>[...]
> >>>>+    for(k = 0; k < n8; k += 2) {
> >>>[...]
> >>>>+        asm (
> >>>>+            "movaps          %4, %%xmm0 \n\t"   // xmm0 = 0 1 2 3
> >>>>+            "movaps          %5, %%xmm1 \n\t"   // xmm1 = 4 5 6 7
> >>>[...]
> >>>>+            :"m"(z[k]), "m"(z[-2 - k])
> >>>
> >>>I'm missing something or it could be unaligned?
> >>>z is 8 byte not 16.
> >>
> >>The array index is even.
> >I know
> >
> >>In order for n8 to be odd you'd need an 8
> >>element fft.
> >
> >I need an odd multiple of 8
> 
> But fft size can only be a power of 2.

Strictly speaking fft can be done with any number but as the prime
factors get larger the efficiency becomes rather poor, with the worst
case being large prime sizes. Of course you need a very different
implementation to support sizes that are not powers of two and very
few people are interested in the "not power of two" case.

Rich