[FFmpeg-devel] Amazing intrinsics improvments in gcc 4

Wed Mar 19 19:43:20 CET 2008

On Wed, Mar 19, 2008 at 07:21:14PM +0100, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > I thought some people here would be interrested as there were various claims
> > on gccs abilities and improvments posted here lately ...
> 
> ------- Comment #23 From Uros Bizjak 2008-03-19 10:45 -------
> 
> As said in PR 19161:
> 
> The LCM infrastructure doesn't support mode switching in the way that 
> would be
> usable for emms. Additionally, there are MANY problems expected when sharing
> x87 and MMX registers (i.e. handling of uninitialized x87 registers at the
> beginning of the function - this is the reason we don't implement x87 
> register
> passing ABI).
> 
> Automatic MMX vectorization is not exactly a much usable feature 
> nowadays (we
> have SSE that works quite well here). Due to recent changes in MMX register
> allocation area, excellent code is produced using MMX intrinsics, I'm 
> closing
> this bug as WONTFIX.
> 
> Also, auto-vectorization would produce either MMX or SSE code, but not 
> both of
> them:
> 
> #define UNITS_PER_SIMD_WORD (TARGET_SSE ? 16 : UNITS_PER_WORD)
> 
> Seems Uros is fighting your battle and providing some interesting code.
> 
> Still, the root of the problem is that x86 sucks.

No, the root of the problem is that gcc devels are idiots
gcc has no business putting emms anywhere, thats the programmers job
same as with free().
If i do write SIMD code i do know what iam doing and do know i might have
to execute emms, i absolutely dont want gcc to guess it behind my back.

Also if i explicitly force gcc to use paddw:
void test(){
    w= __builtin_ia32_paddw(w,w);
    dw= (mmxdw)w;
}
-----
gcc-4.3 -mtune=pentium3 -march=pentium3 -fomit-frame-pointer -S -O3
generates:
        subl    $12, %esp
        movq    w, %mm0
        movq    %mm0, (%esp)
        paddw   %mm0, %mm0
        movq    %mm0, w
        movl    w, %eax
        movl    w+4, %edx
        movl    %eax, dw
        movl    %edx, dw+4
        addl    $12, %esp
        ret
-----
compared to
gcc-3.4 -mtune=pentium3 -march=pentium3 -fomit-frame-pointer -S -O3
        movq    w, %mm1
        paddw   %mm1, %mm1
        movq    %mm1, w
        movq    w, %mm0
        movq    %mm0, dw
        ret

So where is that "excellent code is produced using MMX intrinsics" ???

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080319/84440185/attachment.pgp>