[Ffmpeg-devel] mpegaudiodec.c and armv5te optimizations

Wed Oct 4 10:05:46 CEST 2006

Hi

On Wed, Oct 04, 2006 at 01:47:23AM +0300, Siarhei Siamashka wrote:
> On Tuesday 03 October 2006 23:34, Michael Niedermayer wrote:
> 
> > > I would like to ask those who are familiar with mp3 decoding algorithm
> > > in mpegaudiodec.c better if there could be any really nasty things
> > > happening after changing current
> > >
> > > #define MULH(a,b) (((int64_t)(a) * (int64_t)(b))>>32)
> > > #define FIXHR(a) ((int)((a) * (1LL<<32) + 0.5))
> > >
> > > to something like
> > >
> > > #define MULH(a,b) (((int64_t)(a) * (int16_t)(b))>>16)
> > > #define FIXHR(a) ((int16_t)((a) * (1LL<<16) + 0.5))
> > >
> > > in low quality decoding mode.
> > >
> > > I tried to decode a few mp3 files and the difference does not seem to be
> > > very noticeable (samples seem to differ +-4 at most).
> >
> > i think the change should be ok (for ARM) for x86 it should be slower
> 
> Sure, I just wanted to know if reduction of precision of these constants from
> 32-bit to 16-bit could have any other negative effect. And this optimization
> can really only be used for processors that have a special instruction for
> that operation.
> 
> Anyway, here is a simple patch attached.
> 
> Tested with the latest mplayer SVN (with some tweaks to get it compiled with
> HAVE_ARMV5TE defined). Configured using:
> CFLAGS="-O4 -pipe -ffast-math -fomit-frame-pointer -mcpu=arm926ej-s -DHAVE_ARMV5TE" ./configure --disable-libavcodec_mpegaudio_hp
> 
> Results of decoding mp3 file to /dev/null:
> ffmp3 (current): 58.7 seconds
> ffmp3 (patched ): 56.6 seconds
> libmad: 46.2 seconds
> 
> Effect is minimal and quite disappointing. We gain very little, 

17% of the difference between libmad and ffmp3 isnt that small
if you do another 5 such optimizations we would beat libmad


> but lose some
> precision. 

yes, but its not much, worst case +-4 difference
btw, could you test by how much the high-low quality difference changes with
this optimization (mean squared error and max difference), if it doesnt
double the error then iam in favor of applying this patch


> But it is understandable, compiler can't load and pack two 16-bit
> constants in a register, also it does not take into account 1 clock penalty
> if the result of multiplication is used immediately in the next instruction.
> So for any really useful results, fully assembler optimized code is required.

or writing a better compiler :)
theres nothing which prevents the compiler from doing these optimization short
of the incompetence of the developers who wrote the compiler

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is