[Ffmpeg-devel] mpegaudiodec.c and armv5te optimizations

Thu Oct 5 07:01:10 CEST 2006

On Wednesday 04 October 2006 11:05, Michael Niedermayer wrote:

[...]

> > > > I would like to ask those who are familiar with mp3 decoding
> > > > algorithm in mpegaudiodec.c better if there could be any really nasty
> > > > things happening after changing current
> > > >
> > > > #define MULH(a,b) (((int64_t)(a) * (int64_t)(b))>>32)
> > > > #define FIXHR(a) ((int)((a) * (1LL<<32) + 0.5))
> > > >
> > > > to something like
> > > >
> > > > #define MULH(a,b) (((int64_t)(a) * (int16_t)(b))>>16)
> > > > #define FIXHR(a) ((int16_t)((a) * (1LL<<16) + 0.5))
> > > >
> > > > in low quality decoding mode.

[test results of a simple inline asm patch skipped]

> > Effect is minimal and quite disappointing. We gain very little,
>
> 17% of the difference between libmad and ffmp3 isnt that small
> if you do another 5 such optimizations we would beat libmad
>
> > but lose some
> > precision.
>
> yes, but its not much, worst case +-4 difference
> btw, could you test by how much the high-low quality difference changes
> with this optimization (mean squared error and max difference), if it
> doesnt double the error then iam in favor of applying this patch

That +-4 difference was just a quick and completely nonscientific
observation of cmp output. To get valid test results of this quality
reduction, surely proper tests are requred. Just got an idea, I did 
benchmark a fast low quality build of libmad, it is interesting to check 
not only performance but also quality for libmad as well. It may be that
ffmpeg decoder  is somewhat slower, but has higher quality and that 
could explain the results, so that could be considered a win already :-)

But I have a few questions:
1. are there any tools that can be used for such tests? could they
be 'audiogen.c' and 'tiny_psnr.c' from tests subdirectory in ffmpeg? if ffmpeg
has some tools for audio quality regression testing, it would be stupid not to
use them and invent something new :)
2. what audio file samples to use (for example testing on a file with a
complete silence encoded would be stupid)? probably sound volume can
probably also have some effect on decoding precision.
3. what to use as a reference mp3 decoder which is supposed to have ideal
quality? or maybe just grab some audio CD, encode it (with something?) to
different bitrates, decode it with different mp3 decoders and compare results
against original CD file and not against a reference decoder?

Anyway, quality regression tests of such option can be performed even
without having ARM cpu. Just using the macros mentioned in the top of this
message can be used to simulate precision of such decoder on any arch. So if
anybody could help with performing these tests and ensuring that I did not
mess things up, it would be really great.

Right now I don't have a good understanding of underlying algorithms yet and
just can optimize code at function level. You can use me as some sort of
codegenerator of an efficient code for ARM :-)

mpegaudiodec.c can have at least dct32, imdct12 and imdct36 implemented in 
entirely in assembler and it will have much higher effect than just using
a few inline macros. But I need to know if this work is not going to be a
waste because of improper assumption about quality effect.

> > But it is understandable, compiler can't load and pack two 16-bit
> > constants in a register, also it does not take into account 1 clock
> > penalty if the result of multiplication is used immediately in the next
> > instruction. So for any really useful results, fully assembler optimized
> > code is required.
>
> or writing a better compiler :)
> theres nothing which prevents the compiler from doing these optimization
> short of the incompetence of the developers who wrote the compiler

Well, that's right, For example a code like this could be probably 
optimized properly with the compiler figuring out that it can use fast
16bit*16bit->32bit multiply instruction here:

int32_t fast_multiply(int16_t a, int16_t b)
{
   return (int32_t)a * b;
}

But my gcc 3.4.4 (recommended by Nokia currently for use with maemo SDK)
doesn't seem to be able to do it. I'll try the latest gcc 4 to see if it got
some improvements, but I doubt that it can get close to hand written code :)
Forgot to add that there are also multiply-accumulate instructions which may
be also difficult for a compiler to generate in the code, but they also save
an extra addition instruction that would be required.

By the way, I did some quick search and it does not seem like any other
(open source) libraries have armv5te optimizations yet. So probably if I'm 
not mistaken, ffmpeg can be first here. And most of widely used linux based
PDA (Nokia 770 and Sharp Zaurus, did I miss anything?) do have these
instructions supported already.