[FFmpeg-devel] [FFmpeg-cvslog] r12171 - trunk/doc/optimization.txt
İsmail Dönmez
ismail
Thu Feb 21 20:16:39 CET 2008
Hi,
On Thu, Feb 21, 2008 at 9:11 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Thu, Feb 21, 2008 at 08:52:17PM +0200, ?smail D?nmez wrote:
> > Hi,
> >
> > >Author: melanson
> > >Date: Thu Feb 21 19:46:49 2008
> > >New Revision: 12171
> > >
> > >Log:
> > >minor English corrections
> > >
> > >
> > >Modified:
> > > trunk/doc/optimization.txt
> > [...]
> > > -Use asm() instead of intrinsics. Later requires a good optimizing compiler
> > > +Use asm() instead of intrinsics. The latter requires a good optimizing compiler
> > > which gcc is not.
> >
> > We all know this is FUD now, I know Michael still uses gcc 2.95 but
> > the world have moved on. GCC 4.3 is about to be released.
> > So please either backup these claims or note that this is not true for
> > recent GCCs.
>
> I use gcc r132072 ATM, i admit its a few days old, do you claim that gcc
> was rewritten yesterday?
>
> Also to backup the claim, the following was suggested to me a few days ago:
> -static inline void diff_pixels_mmx(DCTELEM *block, const uint8_t *s1, const uint8_t *s2, int stride)
> +static void diff_pixels_mmx(DCTELEM *block, const uint8_t *s1, const uint8_t *s2, long stride)
> {
> - asm volatile(
> - "pxor %%mm7, %%mm7 \n\t"
> - "mov $-128, %%"REG_a" \n\t"
> - ASMALIGN(4)
> - "1: \n\t"
> - "movq (%0), %%mm0 \n\t"
> - "movq (%1), %%mm2 \n\t"
> - "movq %%mm0, %%mm1 \n\t"
> - "movq %%mm2, %%mm3 \n\t"
> - "punpcklbw %%mm7, %%mm0 \n\t"
> - "punpckhbw %%mm7, %%mm1 \n\t"
> - "punpcklbw %%mm7, %%mm2 \n\t"
> - "punpckhbw %%mm7, %%mm3 \n\t"
> - "psubw %%mm2, %%mm0 \n\t"
> - "psubw %%mm3, %%mm1 \n\t"
> - "movq %%mm0, (%2, %%"REG_a") \n\t"
> - "movq %%mm1, 8(%2, %%"REG_a") \n\t"
> - "add %3, %0 \n\t"
> - "add %3, %1 \n\t"
> - "add $16, %%"REG_a" \n\t"
> - "jnz 1b \n\t"
> - : "+r" (s1), "+r" (s2)
> - : "r" (block+64), "r" ((long)stride)
> - : "%"REG_a
> - );
> + long offset = -128;
> + MOVQ_ZERO(mm7);
> + do {
> + asm volatile(
> + "movq (%0), %%mm0 \n\t"
> + "movq (%1), %%mm2 \n\t"
> + "movq %%mm0, %%mm1 \n\t"
> + "movq %%mm2, %%mm3 \n\t"
> + "punpcklbw %%mm7, %%mm0 \n\t"
> + "punpckhbw %%mm7, %%mm1 \n\t"
> + "punpcklbw %%mm7, %%mm2 \n\t"
> + "punpckhbw %%mm7, %%mm3 \n\t"
> + "psubw %%mm2, %%mm0 \n\t"
> + "psubw %%mm3, %%mm1 \n\t"
> + "movq %%mm0, (%2, %4) \n\t"
> + "movq %%mm1, 8(%2, %4) \n\t"
> + : : "r" (s1), "r" (s2), "r" (block+64), "r" (stride), "r" (offset)
> + : "memory");
> + s1 += stride;
> + s2 += stride;
> + offset += 16;
> + } while (offset < 0);
> }
>
> the effect that has on the generated asm is:
> .L143:
> .loc 3 241 0
> leaq (%rsi,%r8), %rdx
> leaq (%r10,%r8), %rax
> #APP
> # 241 "dsputil_mmx.c" 1
> movq (%rdx), %mm0
> movq (%rax), %mm2
> movq %mm0, %mm1
> movq %mm2, %mm3
> punpcklbw %mm7, %mm0
> punpckhbw %mm7, %mm1
> punpcklbw %mm7, %mm2
> punpckhbw %mm7, %mm3
> psubw %mm2, %mm0
> psubw %mm3, %mm1
> movq %mm0, (%rdi, %r9)
> movq %mm1, 8(%rdi, %r9)
>
> # 0 "" 2
> .loc 3 258 0
> #NO_APP
> addq %rcx, %r8
> .loc 3 259 0
> addq $16, %r9
> jne .L143
> -------------
>
> As you can see gcc injects 2 unneeded lea instructions in the innermost loop.
> And i think this is a very simple asm, if you want you can try this with some
> complex code, but i recommand that you have a few bags for vomit ready ...
If you can give an example based on complex asm we can report a bug to
gcc, just saying gcc is not a good optimizer
does not help anyone, do we have another better open source compiler?
No. So if you have a better example of bad asm produced we can ask
gcc developers.
Thanks!
--
Never learn by your mistakes, if you do you may never dare to try again
More information about the ffmpeg-devel
mailing list