[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)
    Paweł Sikora 
    pluto
       
    Tue Jan 31 21:17:58 CET 2006
    
    
  
Hi all,
I have an implementation of transpose4x4 in C which uses gcc's vector
extensions. It doesn't press register allocator so much and allows
optimal code scheduling.
Instantiation of attached patch e.g. in foo(dst, src, 4, 4)
gives a nice piece of code:
[ x86-64 example ]
foo:    movd        4(%rsi), %mm0
        movd        (%rsi), %mm1
        movd        8(%rsi), %mm2
        movd        12(%rsi), %mm3
        punpcklbw   %mm0, %mm1
        punpcklbw   %mm3, %mm2
        movq        %mm1, %mm0
        punpckhwd   %mm2, %mm1
        punpcklwd   %mm2, %mm0
        movd        %mm1, 8(%rdi)
        punpckhdq   %mm1, %mm1
        movd        %mm0, (%rdi)
        punpckhdq   %mm0, %mm0
        movd        %mm1, 12(%rdi)
        movd        %mm0, 4(%rdi)
        ret
actually gcc-4.1 has a good optimizer and happy asm. hardcoding
doesn't introduce incredible performance boost but only degradation
of code scheduling.
BR,
Pawel.
-- 
to_be || !to_be == 1, to_be | ~to_be == -1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ffmpeg-gcc4.patch
Type: text/x-diff
Size: 1774 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20060131/7a802117/attachment.patch>
    
    
More information about the ffmpeg-devel
mailing list