[FFmpeg-devel] [PATCH] simplify some mpc7 code

Thu Jan 28 21:01:09 CET 2010

On Thu, Jan 28, 2010 at 08:03:58AM +0200, Kostya wrote:
> On Wed, Jan 27, 2010 at 09:32:16PM +0100, Reimar D?ffinger wrote:
> > On Wed, Jan 27, 2010 at 09:29:41PM +0100, Reimar D?ffinger wrote:
> > > On Wed, Jan 27, 2010 at 10:13:51PM +0200, Kostya wrote:
> > > > On Wed, Jan 27, 2010 at 09:08:01PM +0100, Reimar D?ffinger wrote:
> > > > > Hello,
> > > > > there is the same code duplicated 5 times, which IMO obfuscates the algorithm
> > > > > a lot, attached patch extracts it into a function.
> > > > > It also reworks a similar but different piece of code so it is IMO simple and
> > > > > gcc no longer complains about "t" possibly being used uninitialized.
> > > > 
> > > > Idea look OK but I see get_bits(8) in function and get_bits(6) in later
> > > > code. Have you tested it? And maybe add inline qualifier to that
> > > > function?
> > > 
> > > Tested now. I didn't try to benchmark if there's a point in the "inline".
> > > There is another simplification possible, but I think that definitely
> > > needs bechmarking, namely instead of the switch do
> > > int scfi = bands[i].scfi[ch];
> > > bands[i].scf_idx[ch][1] = scfi & 1 ? bands[i].scf_idx[ch][0] : get_scale_idx(&gb, bands[i].scf_idx[ch][0]);
> > > bands[i].scf_idx[ch][2] = scfi & 2 ? bands[i].scf_idx[ch][1] : get_scale_idx(&gb, bands[i].scf_idx[ch][1]);
> > > There might even be another simplification step possible after this, not sure.
> > 
> > Forgot updated patch.
> 
> OK if works and triggers that condition in get_scale_idx()

Hm, I'd like to leave the inline away.
Reason:
Representative value with old code and av_always_inline:
58404 dezicycles in a, 8192 runs, 0 skips

New code, without av_always_inline:
55407 dezicycles in a, 8192 runs, 0 skips

I.e. not inlining is 5% faster (Intel Atom, gcc 4.4.1).

Notes: just "inline" or not makes no difference, but I don't
like using it when its likely enough to hurt.
I had to hack away the skips detection, because the run
time for the whole loop varies extremely.
Timings are for the whole "get scale indexes" loop.