[MPlayer-cvslog] r33811 - trunk/libaf/af_format.c

Ivan Kalvachev ikalvachev at gmail.com
Thu Jul 7 01:52:58 CEST 2011


On 7/6/11, Ivan Kalvachev <ikalvachev at gmail.com> wrote:
> On 7/5/11, Reimar Döffinger <Reimar.Doeffinger at gmx.de> wrote:
>> On Tue, Jul 05, 2011 at 12:19:41PM +0300, Ivan Kalvachev wrote:
>>> On 7/5/11, Reimar Döffinger <Reimar.Doeffinger at gmx.de> wrote:
>>> >
>>> >
>>> > On 5 Jul 2011, at 00:24, iive <subversion at mplayerhq.hu> wrote:
>>> >
>>> >> Author: iive
>>> >> Date: Tue Jul  5 00:24:52 2011
>>> >> New Revision: 33811
>>> >>
>>> >> Log:
>>> >> Audio format conversion from float to int is not checked for
>>> >> overflow.
>>> >> The float may be slightly off the [-1;1] range thus causing audible
>>> >> crackling noise.
>>> >>
>>> >> Add proper clipping using av_clip functions.
>>> >
>>> > That is not all you did.
>>> > You changed the multipliers, you added a use of ldexp for half the
>>> > cases
>>> > and
>>> > you made the code throw away up to 8 bit precision when converting to
>>> > int
>>> > (for files decoding to float in the range [-1/256;1/256]).
>>> > Also float mantissa has 24 bits precision not 23.
>>>
>>> I really missed to describe this in the message.
>>>
>>> The exact multiples of 2 constants are the proper ones. You can see
>>> them used in integet->float conversion case.
>>> The decreased by 1 constants were used as a mean to avoid overflow
>>> when we have 1.0, however they shrink the range and it could  lead to
>>> distortion/fadedown if there are multiple back and forth conversions.
>>> Indeed the 1.0 would be clipped to 0.996093475, but all other values
>>> would be stable.
>>
>> However that clipping introduces a DC bias.
>> Not that it really matters though.
>>
>>> The ldexp() is used only as a faster way to multiply, as it changes
>>> only the exponent. It have same requirement as lrint (C99,
>>> _OPEN_SOURCE 600, etc). Do you prefer constant multiply or do you want
>>> all cases replaced with ldexp()?
>>
>> I don't really care unless someone finds a significant performance
>> difference.
>> But doing it inconsistently is a major maintenance issue since you
>> can never be sure whether there is maybe some minor but critical
>> difference
>> between the cases that must be preserved.
>>
>>> The float is 32 bits total, 1 bit sign, 8 bits exponent and 23 bits
>>> mantissa. But you are right, there is one bit that is not stored, but
>>> assumed. You are also correct for the loss of precision in this case.
>>
>> The loss of precision is 8 or 9 bits if the exponent is -8, not just
>> one.
>>
>>> 3. Doing the multiply of 2^31 first, and then clipping float numbers
>>> before doing rounding. Thanks to lrint default rounding we'd have to
>>> use (1<<31)-(1<<8) again. Either use trunc() or simple i=(int)f;
>>> conversion to avoid it.
>>
>> There is no problem with clipping to (1<<31)-(1<<8), that does not
>> lose precision in general. The problem is with extracting a 24 bit
>> int and left-shifting by 8 bit.
>> Also the max value is 1<<31 - 1 so that's what it should be clipped to.
>> There's also no real point in doing the multiply for the clipping case,
>> unless you can use that to save some cycles.
>> Not particularly fast, but I think it should cover everything except
>> probably NaNs (of course must be absolutely sure the compiler evaluates
>> "bits" to a constant):
>> #define CONV_F2S(in, out, bits)
>> if (in <= -1)
>>   out = -(1ULL << (bits - 1));
>> else if (in >= 1 - 1.0/(1ULL << bits))
>>   out = (1ULL << (bits - 1)) - 1;
>> else
>>   out = lrint(ldexp(in, bits - 1));
>
>
> I'll write more detailed explanation tomorrow.
> I tried benchmarking some of the implementations.
> So far llrint is the slowest on all CPU.
> ldexp is also slower than multiply on all CPU.
> (all cpu = Intel Clarkdale and AMD TBird).

I don't have time to do experiments atm (maybe at weekend),
so I've used your proposal for a fix in the 32 bit case.

If you are interested, it is about 5-7 times faster than the incorrect
code (and all the other correct cases that use av_clip), but that's
only on the Clarkdale, on my old AMD tbird, it is 10% slower. (both
with i386 instruction set, no sse or vectorization).

The funny thing is that the TBird run all the other cases about 4
times faster than the Clarkdale, despite having 1/3 clockspeed and a
lot less cache. I guess there is some other bottleneck that needs
investigation.

If you find more issues, fell free to commit right away.


More information about the MPlayer-cvslog mailing list