[FFmpeg-devel] [PATCH] h264.c/decode_cabac_residual optimization

Wed Jul 2 00:00:28 CEST 2008

"Siarhei Siamashka" <siarhei.siamashka at gmail.com> writes:

> On Tue, Jul 1, 2008 at 9:44 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Tue, Jul 01, 2008 at 01:14:56PM -0400, Alexander Strange wrote:
>> [...]
>>
>>> diff -ru --exclude='*svn*' ffmpeg-/libavcodec/h264.c ffmpeg/libavcodec/h264.c
>>> --- ffmpeg-/libavcodec/h264.c 2008-06-30 14:47:53.000000000 -0400
>>> +++ ffmpeg/libavcodec/h264.c  2008-06-30 14:47:59.000000000 -0400
>>> @@ -5517,7 +5517,7 @@
>>>          }
>>>      }
>>>
>>> -    for( coeff_count--; coeff_count >= 0; coeff_count-- ) {
>>> +    while( coeff_count-- ) {
>>>          uint8_t *ctx = coeff_abs_level1_ctx[node_ctx] + abs_level_m1_ctx_base;
>>>
>>>          int j= scantable[index[coeff_count]];
>>
>> ok if faster or same speed
>
> Typically pre-decrement is always preferred in code optimized for
> performance as it is generally faster. Something like this would be
> better (also it is closer to the old code):
> while( --coeff_count >= 0 ) {
> ...
> }
>
> You can try to compile this sample with the best possible
> optimizations, look at the assembly output and check where the
> generated code is better and why:
>
> /**********************/
> int q();
>
> void f1(int n)
> {
>     while (--n >= 0) {
>         q();
>     }
> }
>
> void f2(int n)
> {
>     while (n--) {
>         q();
>     }
> }
> /**********************/

Any half-decent compiler should generate the same code for those two
functions.  GCC for ARM generates a slightly different, but
equivalent, setup sequence, and the loops are exactly the same.
I can't be bothered to check x86.

-- 
M?ns Rullg?rd
mans at mansr.com