[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2
    Rich Felker 
    dalias
       
    Thu Aug 24 20:25:06 CEST 2006
    
    
  
On Thu, Aug 24, 2006 at 10:50:41AM -0700, Loren Merritt wrote:
> On Thu, 24 Aug 2006, Luca Barbato wrote:
> 
> >Loren Merritt wrote:
> >>On Thu, 24 Aug 2006, Luca Barbato wrote:
> >>
> >>>Zuxy Meng wrote:
> >>>
> >>>>+    n = 1 << s->nbits;
> >>>>+    n8 = n >> 3;
> >>>[...]
> >>>>+    z += n8;
> >>>[...]
> >>>>+    for(k = 0; k < n8; k += 2) {
> >>>[...]
> >>>>+        asm (
> >>>>+            "movaps          %4, %%xmm0 \n\t"   // xmm0 = 0 1 2 3
> >>>>+            "movaps          %5, %%xmm1 \n\t"   // xmm1 = 4 5 6 7
> >>>[...]
> >>>>+            :"m"(z[k]), "m"(z[-2 - k])
> >>>
> >>>I'm missing something or it could be unaligned?
> >>>z is 8 byte not 16.
> >>
> >>The array index is even.
> >I know
> >
> >>In order for n8 to be odd you'd need an 8
> >>element fft.
> >
> >I need an odd multiple of 8
> 
> But fft size can only be a power of 2.
Strictly speaking fft can be done with any number but as the prime
factors get larger the efficiency becomes rather poor, with the worst
case being large prime sizes. Of course you need a very different
implementation to support sizes that are not powers of two and very
few people are interested in the "not power of two" case.
Rich
    
    
More information about the ffmpeg-devel
mailing list