[FFmpeg-devel] [PATCH] SSE2 Xvid idct
Michael Niedermayer
michaelni
Mon Apr 14 04:26:19 CEST 2008
On Sun, Apr 13, 2008 at 10:10:21PM -0400, Alexander Strange wrote:
> On Sun, Apr 13, 2008 at 5:39 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > [..]
> > >>>>
> > >>>> #ifdef ARCH_X86_64
> > >>>> # define XMMS "%%xmm12"
> > >>>> #else
> > >>>> # define XMMS "%%xmm2"
> > >>>> #endif
> > >>>> s/%%xmm2/XMMS/
> > >>>>
> > >>>> #ifndef ARCH_X86_64
> > >>>> "movdqa %%xmm2, "spill" \n\t" \
> > >>>> #endif
> > >>>> ...
> > >>>> #ifndef ARCH_X86_64
> > >>>> "movdqa "spill", %%xmm2 \n\t" \
> > >>>> #endif
> > >>>>
> > >>>> or a
> > >>>> MOV_ONLY_ON32" %%xmm2, ...
> > >>>>
> > >>>>
> > >>>> And i think something similar can be don with ROW*
> > >>>
> > >>> Done. The row part is already optimal on 64 since pshufhw handles it.
> > >>
> > >> I meant the
> > >>> "movdqa "ROW2", %%xmm4 \n\t" \
> > >>> "movdqa "ROW6", %%xmm6 \n\t" \
> > >> [...]
> > >>> "movdqa "ROW0", %%xmm4 \n\t" \
> > >>> "movdqa "ROW4", %%xmm6 \n\t" \
> > >>
> > >> they are unneeded on 64.
> > >
> > > Oh, that. Done:
> >
> >
> > [...]
> > > ///IDCT pass on columns, assuming rows 4-6 are zero.
> > ^
> > typo
>
> Fixed.
>
> > [...]
> > > iLLM_HEAD
> > > ASMALIGN(4)
> > > JNZ("%%ecx", "2f")
> > > JNZ("%%eax", "3f")
> > > JNZ("%%edx", "4f")
> > > JNZ("%%ebx", "5f")
> > > iLLM_PASS_SPARSE("%0")
> > > "jmp 6f \n\t"
> > > "2: \n\t"
> > > iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
> > > "3: \n\t"
> > > iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders+4*16), PUT_ODD(ROW5))
> > > JZ("%%edx", "1f")
> > > "4: \n\t"
> > > iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders+5*16), PUT_EVEN(ROW6))
> > > JZ("%%ebx", "1f")
> > > "5: \n\t"
> > > iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders+5*16), PUT_ODD(ROW7))
> > > iLLM_HEAD
> >
> > iLLM_HEAD is executed twice here
>
> That's intentional, it turned out to be the best way to handle it on
> 32-bit. (call it a speculative prefetch)
> But we can get rid of it for x86-64, so I did.
>
> > > iLLM_PASS("%0")
> > > "6: \n\t"
> > > : "+r"(block)
> > > :
> > > : "%eax", "%ecx", "%edx", "%ebx", "memory");
> >
> > ebx + gcc + PIC -> problems
> >
> > Also the changes to existing code are missing this time ...
>
> changed to esi
> The others hadn't changed and I didn't want to repost them every time...
looks ok
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I have never wished to cater to the crowd; for what I know they do not
approve, and what they approve I do not know. -- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080414/3f114c15/attachment.pgp>
More information about the ffmpeg-devel
mailing list