[FFmpeg-devel] [PATCH] h264.c/decode_cabac_residual optimization

Laurent Desnogues laurent.desnogues
Wed Jul 2 12:05:17 CEST 2008


On Wed, Jul 2, 2008 at 11:45 AM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
[...]
>>> /**********************/
>>> int q();
>>>
>>> void f1(int n)
>>> {
>>>     while (--n >= 0) {
>>>         q();
>>>     }
>>> }
>>>
>>> void f2(int n)
>>> {
>>>     while (n--) {
>>>         q();
>>>     }
>>> }
>>> /**********************/
>>
>> Any half-decent compiler should generate the same code for those two
>> functions.
>
> That's not true, just because these two functions are not identical.
> Hint: what happens if you pass -1 or any other negative value to these
> functions?
>
>> GCC for ARM generates a slightly different, but equivalent, setup sequence, and the loops are exactly the same.
>
> In my case, gcc 3.4.4 (using '-march=armv6 -O3 -c' options) generated
> the following assembly output, which is definitely better for 'f1' (3
> instructions in the inner loop instead of 4):
>
> 00000000 <f1>:
>   0:   e92d4010        stmdb   sp!, {r4, lr}
>   4:   e2504001        subs    r4, r0, #1      ; 0x1
>   8:   48bd8010        ldmmiia sp!, {r4, pc}
>   c:   ebfffffe        bl      0 <q>
>  10:   e2544001        subs    r4, r4, #1      ; 0x1
>  14:   5afffffc        bpl     c <f1+0xc>
>  18:   e8bd8010        ldmia   sp!, {r4, pc}
>
> 0000001c <f2>:
>  1c:   e92d4010        stmdb   sp!, {r4, lr}
>  20:   e2504001        subs    r4, r0, #1      ; 0x1
>  24:   38bd8010        ldmccia sp!, {r4, pc}
>  28:   e2444001        sub     r4, r4, #1      ; 0x1
>  2c:   ebfffffe        bl      0 <q>
>  30:   e3740001        cmn     r4, #1  ; 0x1
>  34:   1afffffb        bne     28 <q+0x28>
>  38:   e8bd8010        ldmia   sp!, {r4, pc}
>
> I'm curious, what is the output of your compiler?

CSL 2007q3 and 2008q1 both generate this:

00000000 <f2>:
   0:   e92d4070        push    {r4, r5, r6, lr}
   4:   e2505000        subs    r5, r0, #0      ; 0x0
   8:   08bd8070        popeq   {r4, r5, r6, pc}
   c:   e3a04000        mov     r4, #0  ; 0x0
  10:   e2844001        add     r4, r4, #1      ; 0x1
  14:   ebfffffe        bl      0 <q>
  18:   e1540005        cmp     r4, r5
  1c:   1afffffb        bne     10 <f2+0x10>
  20:   e8bd8070        pop     {r4, r5, r6, pc}

00000024 <f1>:
  24:   e3500001        cmp     r0, #1  ; 0x1
  28:   e92d4070        push    {r4, r5, r6, lr}
  2c:   e1a05000        mov     r5, r0
  30:   48bd8070        popmi   {r4, r5, r6, pc}
  34:   e3a04000        mov     r4, #0  ; 0x0
  38:   e2844001        add     r4, r4, #1      ; 0x1
  3c:   ebfffffe        bl      0 <q>
  40:   e1540005        cmp     r4, r5
  44:   1afffffb        bne     38 <q+0x38>
  48:   e8bd8070        pop     {r4, r5, r6, pc}


Laurent




More information about the ffmpeg-devel mailing list