[Ffmpeg-devel] int vs. float profiler, take 2
Corey Hickey
bugfood-ml
Sat May 21 04:13:53 CEST 2005
Mike Melanson wrote:
> Hi,
> Since the first version of my little profiler generated a reasonable
> amount of activity, attached is a slightly improved version. This one
> does the following:
>
> * runs all 4 of the functions n times as a cache warmup (n=1000 in the
> code); this actually does help with cycle count consistency
> * fetches an overhead cycle count as a baseline
> * C code can fetch iteration count
>
> The ASM code has ITERATIONS set to 1 right now. I would be interested to
> know the results from varying CPUs using 1, 10, and 100 iterations.
>
Oh, here's all the ones I have access to from here. There's still some
variation but the differences between CPUs are more significant than the
differences between each run.
For all tests:
NASM version 0.98.38 compiled on Dec 30 2004
I can't keep the gcc versions identical because I don't have root on
many of these. I don't know if that makes a difference.
INTEL CPUs
=====================================================================
processor : 0
vendor_id : GenuineIntel
cpu family : 5
model : 4
model name : Pentium MMX
stepping : 3
cpu MHz : 199.488
fdiv_bug : no
hlt_bug : no
f00f_bug : yes
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 mmx
bogomips : 393.21
gcc version 3.3.4 (Debian 1:3.3.4-13)
warming up with 1000 cycles...
integer_adder(), 1 adds, 14 cycles used (overhead = 14)
float_adder(), 1 adds, 74 cycles used (overhead = 14)
integer_mult(), 1 mults, 24 cycles used (overhead = 15)
float_mult(), 1 mults, 73 cycles used (overhead = 14)
warming up with 1000 cycles...
integer_adder(), 10 adds, 23 cycles used (overhead = 14)
float_adder(), 10 adds, 1010 cycles used (overhead = 14)
integer_mult(), 10 mults, 105 cycles used (overhead = 15)
float_mult(), 10 mults, 1011 cycles used (overhead = 14)
warming up with 1000 cycles...
integer_adder(), 100 adds, 113 cycles used (overhead = 14)
float_adder(), 100 adds, 10295 cycles used (overhead = 14)
integer_mult(), 100 mults, 915 cycles used (overhead = 15)
float_mult(), 100 mults, 10315 cycles used (overhead = 14)
=====================================================================
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 Mobile CPU 1.70GHz
stepping : 4
cpu MHz : 1196.332
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2359.29
gcc version 3.3.5 (Debian 1:3.3.5-12)
warming up with 1000 cycles...
integer_adder(), 1 adds, 80 cycles used (overhead = 80)
float_adder(), 1 adds, 1100 cycles used (overhead = 80)
integer_mult(), 1 mults, 88 cycles used (overhead = 80)
float_mult(), 1 mults, 1100 cycles used (overhead = 80)
warming up with 1000 cycles...
integer_adder(), 10 adds, 84 cycles used (overhead = 80)
float_adder(), 10 adds, 10352 cycles used (overhead = 80)
integer_mult(), 10 mults, 212 cycles used (overhead = 80)
float_mult(), 10 mults, 10400 cycles used (overhead = 80)
warming up with 1000 cycles...
integer_adder(), 100 adds, 124 cycles used (overhead = 80)
float_adder(), 100 adds, 128960 cycles used (overhead = 80)
integer_mult(), 100 mults, 14980 cycles used (overhead = 80)
float_mult(), 100 mults, 129124 cycles used (overhead = 80)
AMD CPUs
=====================================================================
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 8
model name : AMD-K6(tm) 3D processor
stepping : 12
cpu MHz : 350.055
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 pge mmx pni syscall
3dnow k6_mtrr
bogomips : 690.17
gcc version 3.3.5 (Debian 1:3.3.5-12)
warming up with 1000 cycles...
integer_adder(), 1 adds, 9 cycles used (overhead = 9)
float_adder(), 1 adds, 15 cycles used (overhead = 9)
integer_mult(), 1 mults, 12 cycles used (overhead = 10)
float_mult(), 1 mults, 16 cycles used (overhead = 9)
warming up with 1000 cycles...
integer_adder(), 10 adds, 18 cycles used (overhead = 9)
float_adder(), 10 adds, 67 cycles used (overhead = 9)
integer_mult(), 10 mults, 81 cycles used (overhead = 10)
float_mult(), 10 mults, 67 cycles used (overhead = 21)
warming up with 1000 cycles...
integer_adder(), 100 adds, 108 cycles used (overhead = 9)
float_adder(), 100 adds, 697 cycles used (overhead = 9)
integer_mult(), 100 mults, 308 cycles used (overhead = 11)
float_mult(), 100 mults, 698 cycles used (overhead = 9)
=====================================================================
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) processor
stepping : 2
cpu MHz : 1199.968
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca
cmov pat pse36 mmx fxsr pni syscall mmxext 3dnowext 3dnow
bogomips : 2359.29
gcc version 3.3.5 (Debian 1:3.3.5-12)
warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 15)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 311 cycles used (overhead = 11)
integer_mult(), 100 mults, 385 cycles used (overhead = 11)
float_mult(), 100 mults, 311 cycles used (overhead = 11)
=====================================================================
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) XP 1700+
stepping : 2
cpu MHz : 1468.053
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 2891.77
gcc version 3.3.5 (Debian 1:3.3.5-12)
warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 311 cycles used (overhead = 11)
integer_mult(), 100 mults, 385 cycles used (overhead = 11)
float_mult(), 100 mults, 311 cycles used (overhead = 11)
=====================================================================
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(tm) XP 2200+
stepping : 0
cpu MHz : 1808.905
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3579.90
gcc version 3.3.5 (Debian 1:3.3.5-12)
warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 36 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 322 cycles used (overhead = 11)
integer_mult(), 100 mults, 389 cycles used (overhead = 11)
float_mult(), 100 mults, 322 cycles used (overhead = 11)
=====================================================================
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 10
model name : AMD Athlon(tm) XP 2500+
stepping : 0
cpu MHz : 1830.138
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips : 3612.67
gcc version 3.3.6 (Debian 1:3.3.6-4)
warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 12 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 322 cycles used (overhead = 11)
integer_mult(), 100 mults, 386 cycles used (overhead = 11)
float_mult(), 100 mults, 322 cycles used (overhead = 11)
=====================================================================
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(tm) XP 2600+
stepping : 1
cpu MHz : 2076.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 4112.38
gcc version 3.3.5 (Debian 1:3.3.5-12)
warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 322 cycles used (overhead = 11)
integer_mult(), 100 mults, 390 cycles used (overhead = 11)
float_mult(), 100 mults, 322 cycles used (overhead = 11)
=====================================================================
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 10
model name : AMD Athlon(tm) XP 3200+
stepping : 0
cpu MHz : 2205.278
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips : 4358.14
gcc version 3.3.5 (Debian 1:3.3.5-12)
warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 311 cycles used (overhead = 11)
integer_mult(), 100 mults, 385 cycles used (overhead = 11)
float_mult(), 100 mults, 311 cycles used (overhead = 11)
=====================================================================
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 12
model name : AMD Athlon(tm) 64 Processor 3400+
stepping : 0
cpu MHz : 2532.087
cache size : 512 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm
3dnowext 3dnow
bogomips : 5013.50
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-12)
warming up with 1000 cycles...
integer_adder(), 1 adds, 5 cycles used (overhead = 5)
float_adder(), 1 adds, 8 cycles used (overhead = 8)
integer_mult(), 1 mults, 5 cycles used (overhead = 5)
float_mult(), 1 mults, 8 cycles used (overhead = 8)
warming up with 1000 cycles...
integer_adder(), 10 adds, 5 cycles used (overhead = 5)
float_adder(), 10 adds, 8 cycles used (overhead = 8)
integer_mult(), 10 mults, 13 cycles used (overhead = 5)
float_mult(), 10 mults, 8 cycles used (overhead = 5)
warming up with 1000 cycles...
integer_adder(), 100 adds, 81 cycles used (overhead = 5)
float_adder(), 100 adds, 295 cycles used (overhead = 8)
integer_mult(), 100 mults, 281 cycles used (overhead = 5)
float_mult(), 100 mults, 295 cycles used (overhead = 8)
-Corey
More information about the ffmpeg-devel
mailing list