[FFmpeg-devel] [PATCH] SSE2 Xvid idct
Alexander Strange
astrange
Mon Apr 14 04:10:21 CEST 2008
On Sun, Apr 13, 2008 at 5:39 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> [..]
> >>>>
> >>>> #ifdef ARCH_X86_64
> >>>> # define XMMS "%%xmm12"
> >>>> #else
> >>>> # define XMMS "%%xmm2"
> >>>> #endif
> >>>> s/%%xmm2/XMMS/
> >>>>
> >>>> #ifndef ARCH_X86_64
> >>>> "movdqa %%xmm2, "spill" \n\t" \
> >>>> #endif
> >>>> ...
> >>>> #ifndef ARCH_X86_64
> >>>> "movdqa "spill", %%xmm2 \n\t" \
> >>>> #endif
> >>>>
> >>>> or a
> >>>> MOV_ONLY_ON32" %%xmm2, ...
> >>>>
> >>>>
> >>>> And i think something similar can be don with ROW*
> >>>
> >>> Done. The row part is already optimal on 64 since pshufhw handles it.
> >>
> >> I meant the
> >>> "movdqa "ROW2", %%xmm4 \n\t" \
> >>> "movdqa "ROW6", %%xmm6 \n\t" \
> >> [...]
> >>> "movdqa "ROW0", %%xmm4 \n\t" \
> >>> "movdqa "ROW4", %%xmm6 \n\t" \
> >>
> >> they are unneeded on 64.
> >
> > Oh, that. Done:
>
>
> [...]
> > ///IDCT pass on columns, assuming rows 4-6 are zero.
> ^
> typo
Fixed.
> [...]
> > iLLM_HEAD
> > ASMALIGN(4)
> > JNZ("%%ecx", "2f")
> > JNZ("%%eax", "3f")
> > JNZ("%%edx", "4f")
> > JNZ("%%ebx", "5f")
> > iLLM_PASS_SPARSE("%0")
> > "jmp 6f \n\t"
> > "2: \n\t"
> > iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
> > "3: \n\t"
> > iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders+4*16), PUT_ODD(ROW5))
> > JZ("%%edx", "1f")
> > "4: \n\t"
> > iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders+5*16), PUT_EVEN(ROW6))
> > JZ("%%ebx", "1f")
> > "5: \n\t"
> > iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders+5*16), PUT_ODD(ROW7))
> > iLLM_HEAD
>
> iLLM_HEAD is executed twice here
That's intentional, it turned out to be the best way to handle it on
32-bit. (call it a speculative prefetch)
But we can get rid of it for x86-64, so I did.
> > iLLM_PASS("%0")
> > "6: \n\t"
> > : "+r"(block)
> > :
> > : "%eax", "%ecx", "%edx", "%ebx", "memory");
>
> ebx + gcc + PIC -> problems
>
> Also the changes to existing code are missing this time ...
changed to esi
The others hadn't changed and I didn't want to repost them every time...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sse2-permute.diff
Type: application/octet-stream
Size: 1340 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/c78e1651/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sse2-xvid-idct.diff
Type: application/octet-stream
Size: 1826 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/c78e1651/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: idct_sse2_xvid.c
Type: application/octet-stream
Size: 15375 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/c78e1651/attachment-0002.obj>
More information about the ffmpeg-devel
mailing list