[FFmpeg-devel] [PATCH] AAC: unroll parts of decode_spectrum_and_dequant()

Michael Niedermayer michaelni
Tue Dec 9 21:37:39 CET 2008


On Tue, Dec 09, 2008 at 01:28:53PM -0000, M?ns Rullg?rd wrote:
> 
> Michael Niedermayer wrote:
> > On Mon, Dec 08, 2008 at 08:04:10PM -0800, Jason Garrett-Glaser wrote:
> >> On Mon, Dec 8, 2008 at 7:58 PM, Jason Garrett-Glaser
> >> <darkshikari at gmail.com> wrote:
> >> > On Mon, Dec 8, 2008 at 7:34 PM, Alex Converse <alex.converse at gmail.com>
> >> wrote:
> >> >> On Mon, Dec 8, 2008 at 9:33 PM, Jason Garrett-Glaser
> >> >> <darkshikari at gmail.com>wrote:
> >> >>
> >> >>> On Mon, Dec 8, 2008 at 3:43 PM, Alex Converse <alex.converse at gmail.com>
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > The attached patch unrolling sections of decode spectrum saves me
> >> 5.48%
> >> >>> on
> >> >>> > my mpeg4-lc-256kbps stream on my core2 duo.
> >> >>> >
> >> >>> > Regards,
> >> >>> > Alex Converse
> >> >>>
> >> >>> If dim can only be 2 or 4, wouldn't it be better to do
> >> >>>
> >> >>> if( dim == 4 ) {
> >> >>> do dim 4 stuff
> >> >>> }
> >> >>> do dim 2 stuff
> >> >>>
> >> >>> The switch seems unnecessary.
> >> >>>
> >> >>
> >> >> Idiomatically I like the switch better but your way is faster. When I did
> >> >> that I also tried reverting access back to forward order and got a slight
> >> >> speed up. This way made the unsigned loop just like the other three, so I
> >> >> added that one for another benchmarked verified speed up.
> >> >>
> >> >> The net gain is a 12% decrease in cycles over the original vs 5% before.
> >> >
> >> > if (vq_ptr[2]) coef[coef_tmp_idx + 2] = 1 - 2*(int)get_bits1(gb);
> >> > if (vq_ptr[3]) coef[coef_tmp_idx + 3] = 1 - 2*(int)get_bits1(gb);
> >> >
> >> > Isn't that a rather unnecessary int -> float conversion?  I'd think
> >> > you could do much better than that considering there are only two
> >> > possible input values...
> >> >
> >> > Dark Shikari
> >> >
> >>
> >> Simple proposal for the above:
> >>
> >> static const float lookup[2] = {1.0, -1.0};
> >> if (vq_ptr[2]) coef[coef_tmp_idx + 2] = lookup[get_bits1(gb)];
> >
> >
> > something like:
> > if (vq_ptr[2]) ((uint32_t*)coef)[coef_tmp_idx + 2] = (get_bits1(gb)<<31) +
> > 0x3F800000;
> >
> > might be even faster
> > but i agree with robert that this should be a seperate patch
> 
> Strict aliasing violation. 

it can be implemented with union if its faster, question is, if it is?


> Depending on CPU it might also be slower.

or faster 


> Most FPUs can generate +-1 constants efficiently.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Good people do not need laws to tell them to act responsibly, while bad
people will find a way around the laws. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081209/48a3674b/attachment.pgp>



More information about the ffmpeg-devel mailing list