[FFmpeg-devel] FATE & Regressions (and PPC is broken)

Sun Mar 16 13:57:55 CET 2008

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> On 16 March 2008, Michael Niedermayer wrote:
>> On Sun, Mar 16, 2008 at 12:26:15AM +0000, M?ns Rullg?rd wrote:
>> > M?ns Rullg?rd <mans at mansr.com> writes:
>> > > Diego Biurrun <diego at biurrun.de> writes:
>> > >> On Sat, Mar 15, 2008 at 02:24:41AM +0100, Michael Niedermayer wrote:
>> > >>> On Sat, Mar 15, 2008 at 12:36:36AM +0000, M?ns Rullg?rd wrote:
>> > >>> > Mike Melanson <mike at multimedia.cx> writes:
>> > >>> > > The following regression test is broken on PowerPC (gcc 4.1.2):
>> > >>> > >
>> > >>> > > diff -u -w
>> > >>> > > "/home/melanson/ffmpeg/ffmpeg-main"/tests/ffmpeg.regression.ref
>> > >>> > > tests/data/vsynth.regression
>> > >>> > > --- /home/melanson/ffmpeg/ffmpeg-main/tests/ffmpeg.regression.ref
>> > >>> > > 2008-03-08 19:10:07.000000000 -0800
>> > >>> > > +++ tests/data/vsynth.regression        2008-03-12
>> > >>> > > 14:27:53.000000000 -0700 @@ -197,7 +197,7 @@
>> > >>> > >  353368 ./tests/data/a-flac.flac
>> > >>> > >  c4228df189aad9567a037727d0e763e4
>> > >>> > > *./tests/data/flac.vsynth.out.wav stddev: 33.31 PSNR:65.87
>> > >>> > > bytes:1040384
>> > >>> > > -a1e56bb034c2773438ba1c830c4cea07 *./tests/data/a-wmav1.asf
>> > >>> > > +59575b4756d674cd8c8d53d86c08e8cb *./tests/data/a-wmav1.asf
>> > >>> > >  106004 ./tests/data/a-wmav1.asf
>> > >>> > >  stddev:12251.50 PSNR:14.56 bytes:1056768
>> > >>> > >  stddev:2106.00 PSNR:29.85 bytes:1048576
>> > >>> > > make: *** [codectest] Error 1
>> > >>> >
>> > >>> > OK, I tracked this one down.  It turns out lrint() rounds slightly
>> > >>> > differently for whatever reason (bug or otherwise).  Attached patch
>> > >>> > changes it to lround() (which has well-defined rounding behaviour).
>> > >>> > With this change, the results are the same on x86_64 and ppc64.
>> > >>> >
>> > >>> > This raises the broader question of lrint() usage in FFmpeg.  The
>> > >>> > documentation for the lrint() family says:
>> > >>> >
>> > >>> >   These functions shall round their argument to the nearest integer
>> > >>> >   value, rounding according to the current rounding direction.
>> > >>> >
>> > >>> > Clearly, in a library we do not know the current rounding
>> > >>> > direction, and it would be rude of us to change it.  The lround()
>> > >>> > family, on the other hand, always rounds the same way:
>> > >>> >
>> > >>> >   These functions shall round their argument to the nearest integer
>> > >>> >   value, rounding halfway cases away from zero, regardless of the
>> > >>> >   current rounding direction.
>> > >>>
>> > >>> We have to use lrint* because its faster see:
>> > >>
>> > >> Hmm, does speed really matter that much in that part of the WMA
>> > >> encoder? Or is that just a comment for the general case and Mans'
>> > >> patch is OK?
>> > >
>> > > Yes, speed matters.  Besides, further investigation shows the
>> > > inaccuracy arises elsewhere.  This change worked by accident.
>> >
>> > I've tracked down the real reason for the difference.  It results from
>> > lost precision by use of floats in mdct.c.  Attached patch makes a few
>> > temporary values double instead.  I don't know what it might do to
>> > performance, though.
>
> Hi, I'm sorry to interfere, but it would be nice to clarify a few things.
>
> Could you also test old code on x86 with different gcc options
> (check -O0, -ffloat-store, -mfpmath=sse and the others) to see 
> if affects results of this regression test?

-mfpmath=sse is default on x86_64, which is what I'm using.  I didn't
try the others.

> The problem may be related to the fact that x87 has 80-bit registers 
> and the precision can highly depend on the number of load/store 
> operations (if temporary variables in mdct are assigned to registers, 
> the precision may be higher, if temporary variables are allocated on 
> stack, the precision would be lost on stores clipping 80-bit registers 
> to 32-bit variables on stack). As far as I know, it is x87 weird here 
> and most of the other platforms do floating point math in a right way.

How does -mfpmath=sse handle single-precision values in registers?

> The change from floats to doubles can increase precision when working
> with temporaries for non-x86 platforms, and could reduce the effect,
> but it is only the change from (32-bit vs. 80bit) to (64-bit to 80-bit) 
> and it might only hide the problem for this particular regression test.

When using multiply-accumulate (fmadd etc.) instructions on PPC, the
intermediate product is represented with a 106-bit fraction.  These
instructions are used heavily in the MDCT.

>> I dont know about performance either, i guess there are some situations
>> where it would be slower 
>
> Regarding the performance, doubles are slower than floats when performing
> multiplications at least on ARM (VFP11), also extra conversion from
> float to double is needed on loads which affects performance even more.

On PPC, registers are always in double-precision format,
single-precision values being converted on load.  Single-precision
arithmetic instructions round the result to single precision, but
still use double-precision format.

>> but systems should have asm MDCTS anyway ...
>> So iam ok with the change. 
>
> It still would be nice to keep C code as a reference implementation, so 
> that it could be always possible to check if an assembly optimized mdct
> produces bitexact results.

Floating-point isn't bit-exact, as has just been demonstrated.

-- 
M?ns Rullg?rd
mans at mansr.com