[MPlayer-dev-eng] Using -O4 vs. -Os
Zoltan Hidvegi
mplayer at hzoli.2y.net
Wed Oct 15 10:55:20 CEST 2003
> On Tue, Oct 14, 2003 at 05:14:05PM -0500, Zoltan Hidvegi wrote:
>
> > For the discussion about using -O4 vs. -Os, I've run sume benchmarks,
> > on my Athlon XP Thoroughbred 2233 MHz, 194MHz fsb machine, using
> > gcc-3.3.2 prerelease (debian unstable 3.3.2-0pre5). Compile options
> > for the -Os compile were -Os -march=athlon-4 -mcpu=athlon-4 -pipe
> > -ffast-math -fomit-frame-pointer, and the same with -O4 instead of -Os
> > for the -O4 tests. Most of the time there is not much difference
> > between -O4 and -Os, -O4 is usually faster, but sometimes -Os is
> > slightly faster (e.g. for the gaussian scale of denoise3d filters).
> > However, for hqdn3d, -Os is 5x slower, which is very strange.
>
> First of all: there ain't no thing as -O4, -O3 is the highest
I know that, but mplayer uses -O4 by default, and I was comparing to
the default compile flags.
> optimisation level. I hope you ran the tests more than just once to
> eliminate fluctuation; if so you should also supply the number of
Of course, I did, and the fluctuation was 0.1% or less after the first
run. This is not a research paper, and I have no time to write a
compehensive report. I run with HZ=1000, that should make the CPU
time measurements more accurate.
> -O3 runs a set of more complicated optimisation which can pay off
> sometimes but typicailly bloats the code. You should also use proper
> alignment but IIRC this is automatically implied by -mcpu=<cpu>.
Yes, alignment is automatic with -O2 and -O3, but -Os probably
disables some alignment to save space.
> -Os optimises for size which means that it's a good cache saver;
> good locality especially in caches can dramatically boost performance
> and make software fly.
-Os only optimizes for code size, and I do not think that mplayer is
I-cache limited. Most of the time is spent in small tight loops.
Mplayer can be data cache limited, but -Os does not affect that.
-funroll-loops may increase the cache usage, but that is not enabled
by -O3. On RISC and Itanium -funroll-loops usually makes the code
faster, but on x86 it does not help much.
Actually, gcc-3.3.2 seems to miscompile denoise3d with -Os.
Zoli
More information about the MPlayer-dev-eng
mailing list