[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)
matthieu castet
castet.matthieu
Tue Jan 31 21:25:29 CET 2006
Hi Pawe?,
Pawe? Sikora wrote:
> Hi all,
>
> I have an implementation of transpose4x4 in C which uses gcc's vector
> extensions. It doesn't press register allocator so much and allows
> optimal code scheduling.
>
> Instantiation of attached patch e.g. in foo(dst, src, 4, 4)
> gives a nice piece of code:
>
> [ x86-64 example ]
>
> foo: movd 4(%rsi), %mm0
> movd (%rsi), %mm1
> movd 8(%rsi), %mm2
> movd 12(%rsi), %mm3
> punpcklbw %mm0, %mm1
> punpcklbw %mm3, %mm2
> movq %mm1, %mm0
> punpckhwd %mm2, %mm1
> punpcklwd %mm2, %mm0
> movd %mm1, 8(%rdi)
> punpckhdq %mm1, %mm1
> movd %mm0, (%rdi)
> punpckhdq %mm0, %mm0
> movd %mm1, 12(%rdi)
> movd %mm0, 4(%rdi)
> ret
>
> actually gcc-4.1 has a good optimizer and happy asm. hardcoding
> doesn't introduce incredible performance boost but only degradation
> of code scheduling.
Could you post a benchmarck between the 2 versions ?
More information about the ffmpeg-devel
mailing list