[FFmpeg-devel] r9017 breaks WMA decoding on Intel Macs
Zuxy Meng
zuxy.meng
Sun May 27 16:01:21 CEST 2007
Hi,
2007/5/27, Guillaume Poirier <gpoirier at mplayerhq.hu>:
> Hi,
>
> Le 27 mai 07 ? 14:52, Guillaume POIRIER a ?crit :
>
> > On 5/27/07, Guillaume POIRIER <poirierg at gmail.com> wrote:
> >> Any vorbis should do the trick. I'll try to narrow down the
> >> problem to
> >> see which part of the patch broke it.
> >
> > This hunk is what causes the regression:
>
> Off course this should read: "applying this hunk fixes the regression".
>
>
> > Index: fft_sse.c
> > ===================================================================
> > --- fft_sse.c (revision 9017)
> > +++ fft_sse.c (revision 6577)
> > @@ -100,33 +100,20 @@
> > i = nloops*8;
> > asm volatile(
> > "1: \n\t"
> > - "sub $32, %0 \n\t"
> > + "sub $16, %0 \n\t"
> > "movaps (%2,%0), %%xmm1 \n\t"
> > "movaps (%1,%0), %%xmm0 \n\t"
> > - "movaps 16(%2,%0), %%xmm5 \n\t"
> > - "movaps 16(%1,%0), %%xmm4 \n\t"
> > "movaps %%xmm1, %%xmm2 \n\t"
> > - "movaps %%xmm5, %%xmm6 \n\t"
> > "shufps $0xA0, %%xmm1, %%xmm1 \n\t"
> > "shufps $0xF5, %%xmm2, %%xmm2 \n\t"
> > - "shufps $0xA0, %%xmm5, %%xmm5 \n\t"
> > - "shufps $0xF5, %%xmm6, %%xmm6 \n\t"
> > "mulps (%3,%0,2), %%xmm1 \n\t" // cre*re cim*re
> > "mulps 16(%3,%0,2), %%xmm2 \n\t" // -cim*im cre*im
> > - "mulps 32(%3,%0,2), %%xmm5 \n\t" // cre*re cim*re
> > - "mulps 48(%3,%0,2), %%xmm6 \n\t" // -cim*im cre*im
> > "addps %%xmm2, %%xmm1 \n\t"
> > - "addps %%xmm6, %%xmm5 \n\t"
> > "movaps %%xmm0, %%xmm3 \n\t"
> > - "movaps %%xmm4, %%xmm7 \n\t"
> > "addps %%xmm1, %%xmm0 \n\t"
> > "subps %%xmm1, %%xmm3 \n\t"
> > - "addps %%xmm5, %%xmm4 \n\t"
> > - "subps %%xmm5, %%xmm7 \n\t"
> > "movaps %%xmm0, (%1,%0) \n\t"
> > "movaps %%xmm3, (%2,%0) \n\t"
> > - "movaps %%xmm4, 16(%1,%0) \n\t"
> > - "movaps %%xmm7, 16(%2,%0) \n\t"
> > "jg 1b \n\t"
> > :"+r"(i)
> > :"r"(p), "r"(p + nloops), "r"(cptr)
> >
> >
> > We're quite lucky, it's the shortest of the 2 hunks.
> >
> > Now I need to figure out what's wrong in that hunk.
>
> There's nothing wrong to this hunk!
> It just duplicates the original code and uses "original register
> number" + 4.
> Why on earth would it break on OSX and not on Linux?
>
> Is there's some qualified guru out there who could could enlighten me
> here?
Well, I was thinking that the code might need additional epilog to
handle the case when i%64 != 0, but since it doesn't break Linux I've
no idea then :-(
--
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
More information about the ffmpeg-devel
mailing list