[FFmpeg-devel] [PATCH] ALS: Solve Issue 1657

Thu Jan 7 01:19:57 CET 2010

Am 06.01.10 20:46, schrieb Michael Niedermayer:
> On Wed, Jan 06, 2010 at 01:21:10AM +0100, Thilo Borgmann wrote:
>> Am 05.01.10 23:38, schrieb Michael Niedermayer:
>>> On Tue, Jan 05, 2010 at 02:58:56PM +0100, Thilo Borgmann wrote:
>>>> Am 05.01.10 12:11, schrieb Michael Niedermayer:
>>>>> On Tue, Jan 05, 2010 at 03:59:41AM +0100, Thilo Borgmann wrote:
>>>>>> Am 05.01.10 02:43, schrieb Michael Niedermayer:
>>>>>>> On Tue, Jan 05, 2010 at 02:23:09AM +0100, Thilo Borgmann wrote:
>>>>>>>> Am 05.01.10 01:43, schrieb Michael Niedermayer:
>>>>>>>>> On Tue, Jan 05, 2010 at 12:34:53AM +0100, Thilo Borgmann wrote:
>>>>>>>>>> Am 05.01.10 00:30, schrieb Thilo Borgmann:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> issue 1657 seems to be caused by negative indices used in [].
>>>>>>>>>>> See: http://roundup.ffmpeg.org/roundup/ffmpeg/issue1657
>>>>>>>>>>>
>>>>>>>>>>> Using *() resolves this issue.
>>>>>>>>>>>
>>>>>>>>>>> Tested with gcc 4.0 on MacOS 10.6. There were other versions/compilers
>>>>>>>>>>> mentioned in roundup, maybe these could be tested by someone (you)?
>>>>>>>>>>>
>>>>>>>>>>> I'm sorry, my svn still seems to be broken and produces unusable patches
>>>>>>>>>>> (%ld...). Nevertheless I can apply them if the workaround is ok.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Some artifacts left in als_data.h. Ignore the old patch, updated patch
>>>>>>>>>> attached.
>>>>>>>>>>
>>>>>>>>>> -Thilo
>>>>>>>>>
>>>>>>>>>>  alsdec.c |    4 ++--
>>>>>>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>>> 987821d84540420efa6f2e67be17094074e638f8  als_issue1657.rev1.patch
>>>>>>>>>> Index: libavcodec/alsdec.c
>>>>>>>>>> ===================================================================
>>>>>>>>>> --- libavcodec/alsdec.c	(Revision 21025)
>>>>>>>>>> +++ libavcodec/alsdec.c	(Arbeitskopie)
>>>>>>>>>> @@ -%ld,%ld +%ld,%ld @@
>>>>>>>>>>              y = 1 << 19;
>>>>>>>>>>  
>>>>>>>>>>              for (sb = 0; sb < smp; sb++)
>>>>>>>>>> -                y += MUL64(lpc_cof[sb],raw_samples[smp - (sb + 1)]);
>>>>>>>>>> +                y += MUL64(lpc_cof[sb], *(raw_samples + smp - (sb + 1)));
>>>>>>>>>
>>>>>>>>> patch ok
>>>>>>>>
>>>>>>>> Applied.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> independant of this, it could be that if lpc_cof was reversed
>>>>>>>>>
>>>>>>>>> for (sb = 0; sb < smp; sb++)
>>>>>>>>>     y += MUL64(lpc_cof[sb], raw_samples[sb]);
>>>>>>>>
>>>>>>>> In the second case the index has to be negative which is not possible
>>>>>>>> with this approach.
>>>>>>>
>>>>>>> i dont see why
>>>>>>> also it should be 
>>>>>>> raw_samples[sb]
>>>>>>> and
>>>>>>> raw_samples++
>>>>>>
>>>>>> Hmm. If you could please elaborate a bit more, I will pick this up
>>>>>> tomorrow and will see if I can make it better.
>>>>>
>>>>> one way:
>>>>>
>>>>> -    for (; smp < bd->block_length; smp++) {
>>>>> -        y = 1 << 19;
>>>>> -
>>>>> -        for (sb = 0; sb < opt_order; sb++)
>>>>> -            y += MUL64(bd->lpc_cof[sb],raw_samples[smp - (sb + 1)]);
>>>>> -
>>>>> -        raw_samples[smp] -= y >> 20;
>>>>> +    for (raw_samples+=smp; raw_samples < raw_samples_end; raw_samples++) {
>>>>> +        y = 1 << 19;
>>>>> +
>>>>> +        for (sb= -opt_order; sb<0; sb++)
>>>>> +            y += MUL64(lpc_cof[sb],raw_samples[sb]);
>>>>> +
>>>>> +        raw_samples[0] -= y >> 20;
>>>>>     }
>>>>
>>>> Ok I see, slightly different patch attached that has a ~20% speed gain
>>>> in the second loop alone. Also issue 1657 is avoided by not using smp in
>>>> substraction (and yes, tested with 64-bit & gcc 4.0).
>>>>
>>>
>>>> To have lpc_cof[sb] as well as raw_samples[sb] without math in the
>>>> indices, lpc_cof has to be reverted and its pointer shifted to one next
>>>> to the end (I think...) - this would make it too obfuscated IMO.
>>>
>>> could you try? it would be interresting to see how much faster it is
>>
>> Seems not to be faster. Also, reversed order of lpc_cof would shift this
>> math from these loops into others and reversion has also to be done
>> somewhere.
>>
>> 17446 dezicycles in reversed array, 2042 runs, 6 skips
>> 18731 dezicycles in reversed array, 2038 runs, 10 skips
>> 17446 dezicycles in reversed array, 2042 runs, 6 skips
>>
>> 17651 dezicycles in simple math, 2040 runs, 8 skips
>> 18497 dezicycles in simple math, 2039 runs, 9 skips
>> 17696 dezicycles in simple math, 2038 runs, 10 skips
> 
> where can i find an ALS file for trying myself?

http://samples.mplayerhq.hu/fate-suite/lossless-audio/inside-als.mp4
is ready for download, if you would like to test with the very same
file, I could also provide that somehow.

> 
> 
>>
>>
>>
>> Code snippet:
>>
>>>    // reverse linear prediction coefficients for efficiency
>>>    for (k = 0; k < opt_order; k++)
>>>        lpc_cof_reversed[k] = lpc_cof[opt_order - (k + 1)];
>>>
>>> START_TIMER;
>>>    // reconstruct raw samples
>>>    raw_samples = bd->raw_samples + smp;
>>>    lpc_cof     = lpc_cof_reversed + opt_order;
>>>
>>>    for (; smp < bd->block_length; smp++) {
>>>        y = 1 << 19;
>>>
>>> // simple:
>>> //        for (sb = 0; sb < opt_order; sb++)
>>> //            y += MUL64(bd->lpc_cof[sb], raw_samples[-(sb + 1)]);
>>> // reversed:
>>>        for (sb = -opt_order; sb < 0; sb++)
>>>            y += MUL64(lpc_cof[sb], raw_samples[sb]);
>>>
>>>        *raw_samples++ -= y >> 20;
>>>    }
> 
> checking the end via raw_samples < end should be faster
> (1 variable less, 1 ++ less)
> also if this is on x86_64 try to make sb long instead of int this
> too might help

Yes and yes:

13869 dezicycles in reversed array & end, 2039 runs, 9 skips
13933 dezicycles in reversed array & end, 2040 runs, 8 skips
14816 dezicycles in reversed array & end, 2039 runs, 9 skips

11919 dezicycles in reversed array & end & long sb, 2040 runs, 8 skips
11947 dezicycles in reversed array & end & long sb, 2039 runs, 9 skips
12038 dezicycles in reversed array & end & long sb, 2040 runs, 8 skips

If this long sb shall make it into a patch, how to check for 64-bit,
e.g. using int or long, the ffmpeg-way?

-Thilo