[FFmpeg-devel] [RFC] Loop unrolling in C code for 'vector_fmul_*' functions
Vadim Lebedev
vadim
Sun Jan 13 18:05:48 CET 2008
I'm running your program as follows:
gcc 4.1.2 -O3 -fomit-frame-pointer -msse -o vector_fmul_test
vector_fmul_test.c
./vector_fmul_test 2000
And the output is:
Function: 'vector_fmul_c', time=73.910 (cycles/element=288.713)
Function: 'vector_fmul_c_unrolled', time=73.010 (cycles/element=285.195)
Function: 'vector_fmul_c_other_unrolled', time=72.999
(cycles/element=285.152)
Function: 'vector_fmul_c_simd', time=0.141 (cycles/element=0.552)
Any idea why it is so slow (except simd case)?
P.S.
This is my /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz
stepping : 6
cpu MHz : 1000.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3995.28
clflush size : 64
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz
stepping : 6
cpu MHz : 1000.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3990.15
clflush size : 64
More information about the ffmpeg-devel
mailing list