[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)
Paweł Sikora
pluto
Tue Jan 31 21:17:58 CET 2006
Hi all,
I have an implementation of transpose4x4 in C which uses gcc's vector
extensions. It doesn't press register allocator so much and allows
optimal code scheduling.
Instantiation of attached patch e.g. in foo(dst, src, 4, 4)
gives a nice piece of code:
[ x86-64 example ]
foo: movd 4(%rsi), %mm0
movd (%rsi), %mm1
movd 8(%rsi), %mm2
movd 12(%rsi), %mm3
punpcklbw %mm0, %mm1
punpcklbw %mm3, %mm2
movq %mm1, %mm0
punpckhwd %mm2, %mm1
punpcklwd %mm2, %mm0
movd %mm1, 8(%rdi)
punpckhdq %mm1, %mm1
movd %mm0, (%rdi)
punpckhdq %mm0, %mm0
movd %mm1, 12(%rdi)
movd %mm0, 4(%rdi)
ret
actually gcc-4.1 has a good optimizer and happy asm. hardcoding
doesn't introduce incredible performance boost but only degradation
of code scheduling.
BR,
Pawel.
--
to_be || !to_be == 1, to_be | ~to_be == -1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ffmpeg-gcc4.patch
Type: text/x-diff
Size: 1774 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20060131/7a802117/attachment.patch>
More information about the ffmpeg-devel
mailing list