[Ffmpeg-devel] int vs. float profiler, take 2

Corey Hickey bugfood-ml
Sat May 21 04:13:53 CEST 2005


Mike Melanson wrote:
> Hi,
> 	Since the first version of my little profiler generated a reasonable 
> amount of activity, attached is a slightly improved version. This one 
> does the following:
> 
> * runs all 4 of the functions n times as a cache warmup (n=1000 in the 
> code); this actually does help with cycle count consistency
> * fetches an overhead cycle count as a baseline
> * C code can fetch iteration count
> 
> The ASM code has ITERATIONS set to 1 right now. I would be interested to 
> know the results from varying CPUs using 1, 10, and 100 iterations.
> 

Oh, here's all the ones I have access to from here. There's still some
variation but the differences between CPUs are more significant than the
differences between each run.

For all tests:
NASM version 0.98.38 compiled on Dec 30 2004

I can't keep the gcc versions identical because I don't have root on
many of these. I don't know if that makes a difference.


INTEL CPUs

=====================================================================
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 5
model           : 4
model name      : Pentium MMX
stepping        : 3
cpu MHz         : 199.488
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : yes
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr mce cx8 mmx
bogomips        : 393.21

gcc version 3.3.4 (Debian 1:3.3.4-13)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 14 cycles used (overhead = 14)
float_adder(), 1 adds, 74 cycles used (overhead = 14)
integer_mult(), 1 mults, 24 cycles used (overhead = 15)
float_mult(), 1 mults, 73 cycles used (overhead = 14)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 23 cycles used (overhead = 14)
float_adder(), 10 adds, 1010 cycles used (overhead = 14)
integer_mult(), 10 mults, 105 cycles used (overhead = 15)
float_mult(), 10 mults, 1011 cycles used (overhead = 14)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 113 cycles used (overhead = 14)
float_adder(), 100 adds, 10295 cycles used (overhead = 14)
integer_mult(), 100 mults, 915 cycles used (overhead = 15)
float_mult(), 100 mults, 10315 cycles used (overhead = 14)

=====================================================================
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 Mobile CPU 1.70GHz
stepping        : 4
cpu MHz         : 1196.332
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 2359.29

gcc version 3.3.5 (Debian 1:3.3.5-12)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 80 cycles used (overhead = 80)
float_adder(), 1 adds, 1100 cycles used (overhead = 80)
integer_mult(), 1 mults, 88 cycles used (overhead = 80)
float_mult(), 1 mults, 1100 cycles used (overhead = 80)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 84 cycles used (overhead = 80)
float_adder(), 10 adds, 10352 cycles used (overhead = 80)
integer_mult(), 10 mults, 212 cycles used (overhead = 80)
float_mult(), 10 mults, 10400 cycles used (overhead = 80)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 124 cycles used (overhead = 80)
float_adder(), 100 adds, 128960 cycles used (overhead = 80)
integer_mult(), 100 mults, 14980 cycles used (overhead = 80)
float_mult(), 100 mults, 129124 cycles used (overhead = 80)


AMD CPUs

=====================================================================
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 5
model           : 8
model name      : AMD-K6(tm) 3D processor
stepping        : 12
cpu MHz         : 350.055
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr mce cx8 pge mmx pni syscall
3dnow k6_mtrr
bogomips        : 690.17

gcc version 3.3.5 (Debian 1:3.3.5-12)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 9 cycles used (overhead = 9)
float_adder(), 1 adds, 15 cycles used (overhead = 9)
integer_mult(), 1 mults, 12 cycles used (overhead = 10)
float_mult(), 1 mults, 16 cycles used (overhead = 9)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 18 cycles used (overhead = 9)
float_adder(), 10 adds, 67 cycles used (overhead = 9)
integer_mult(), 10 mults, 81 cycles used (overhead = 10)
float_mult(), 10 mults, 67 cycles used (overhead = 21)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 108 cycles used (overhead = 9)
float_adder(), 100 adds, 697 cycles used (overhead = 9)
integer_mult(), 100 mults, 308 cycles used (overhead = 11)
float_mult(), 100 mults, 698 cycles used (overhead = 9)

=====================================================================
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 4
model name      : AMD Athlon(tm) processor
stepping        : 2
cpu MHz         : 1199.968
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca
cmov pat pse36 mmx fxsr pni syscall mmxext 3dnowext 3dnow
bogomips        : 2359.29

gcc version 3.3.5 (Debian 1:3.3.5-12)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 15)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 311 cycles used (overhead = 11)
integer_mult(), 100 mults, 385 cycles used (overhead = 11)
float_mult(), 100 mults, 311 cycles used (overhead = 11)

=====================================================================
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : AMD Athlon(tm) XP 1700+
stepping        : 2
cpu MHz         : 1468.053
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 2891.77

gcc version 3.3.5 (Debian 1:3.3.5-12)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 311 cycles used (overhead = 11)
integer_mult(), 100 mults, 385 cycles used (overhead = 11)
float_mult(), 100 mults, 311 cycles used (overhead = 11)

=====================================================================
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2200+
stepping        : 0
cpu MHz         : 1808.905
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3579.90

gcc version 3.3.5 (Debian 1:3.3.5-12)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 36 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 322 cycles used (overhead = 11)
integer_mult(), 100 mults, 389 cycles used (overhead = 11)
float_mult(), 100 mults, 322 cycles used (overhead = 11)

=====================================================================
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 2500+
stepping        : 0
cpu MHz         : 1830.138
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips        : 3612.67

gcc version 3.3.6 (Debian 1:3.3.6-4)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 12 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 322 cycles used (overhead = 11)
integer_mult(), 100 mults, 386 cycles used (overhead = 11)
float_mult(), 100 mults, 322 cycles used (overhead = 11)

=====================================================================
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2600+
stepping        : 1
cpu MHz         : 2076.190
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 4112.38

gcc version 3.3.5 (Debian 1:3.3.5-12)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 322 cycles used (overhead = 11)
integer_mult(), 100 mults, 390 cycles used (overhead = 11)
float_mult(), 100 mults, 322 cycles used (overhead = 11)

=====================================================================
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 3200+
stepping        : 0
cpu MHz         : 2205.278
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips        : 4358.14

gcc version 3.3.5 (Debian 1:3.3.5-12)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 11 cycles used (overhead = 11)
float_adder(), 1 adds, 11 cycles used (overhead = 11)
integer_mult(), 1 mults, 11 cycles used (overhead = 11)
float_mult(), 1 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 11 cycles used (overhead = 11)
float_adder(), 10 adds, 11 cycles used (overhead = 11)
integer_mult(), 10 mults, 28 cycles used (overhead = 11)
float_mult(), 10 mults, 11 cycles used (overhead = 11)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 80 cycles used (overhead = 11)
float_adder(), 100 adds, 311 cycles used (overhead = 11)
integer_mult(), 100 mults, 385 cycles used (overhead = 11)
float_mult(), 100 mults, 311 cycles used (overhead = 11)

=====================================================================
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 12
model name      : AMD Athlon(tm) 64 Processor 3400+
stepping        : 0
cpu MHz         : 2532.087
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm
3dnowext 3dnow
bogomips        : 5013.50
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-12)

  warming up with 1000 cycles...
integer_adder(), 1 adds, 5 cycles used (overhead = 5)
float_adder(), 1 adds, 8 cycles used (overhead = 8)
integer_mult(), 1 mults, 5 cycles used (overhead = 5)
float_mult(), 1 mults, 8 cycles used (overhead = 8)
  warming up with 1000 cycles...
integer_adder(), 10 adds, 5 cycles used (overhead = 5)
float_adder(), 10 adds, 8 cycles used (overhead = 8)
integer_mult(), 10 mults, 13 cycles used (overhead = 5)
float_mult(), 10 mults, 8 cycles used (overhead = 5)
  warming up with 1000 cycles...
integer_adder(), 100 adds, 81 cycles used (overhead = 5)
float_adder(), 100 adds, 295 cycles used (overhead = 8)
integer_mult(), 100 mults, 281 cycles used (overhead = 5)
float_mult(), 100 mults, 295 cycles used (overhead = 8)


-Corey





More information about the ffmpeg-devel mailing list