[FFmpeg-devel] [PATCH] h264.c/decode_cabac_residual optimization
Laurent Desnogues
laurent.desnogues
Wed Jul 2 12:05:17 CEST 2008
On Wed, Jul 2, 2008 at 11:45 AM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
[...]
>>> /**********************/
>>> int q();
>>>
>>> void f1(int n)
>>> {
>>> while (--n >= 0) {
>>> q();
>>> }
>>> }
>>>
>>> void f2(int n)
>>> {
>>> while (n--) {
>>> q();
>>> }
>>> }
>>> /**********************/
>>
>> Any half-decent compiler should generate the same code for those two
>> functions.
>
> That's not true, just because these two functions are not identical.
> Hint: what happens if you pass -1 or any other negative value to these
> functions?
>
>> GCC for ARM generates a slightly different, but equivalent, setup sequence, and the loops are exactly the same.
>
> In my case, gcc 3.4.4 (using '-march=armv6 -O3 -c' options) generated
> the following assembly output, which is definitely better for 'f1' (3
> instructions in the inner loop instead of 4):
>
> 00000000 <f1>:
> 0: e92d4010 stmdb sp!, {r4, lr}
> 4: e2504001 subs r4, r0, #1 ; 0x1
> 8: 48bd8010 ldmmiia sp!, {r4, pc}
> c: ebfffffe bl 0 <q>
> 10: e2544001 subs r4, r4, #1 ; 0x1
> 14: 5afffffc bpl c <f1+0xc>
> 18: e8bd8010 ldmia sp!, {r4, pc}
>
> 0000001c <f2>:
> 1c: e92d4010 stmdb sp!, {r4, lr}
> 20: e2504001 subs r4, r0, #1 ; 0x1
> 24: 38bd8010 ldmccia sp!, {r4, pc}
> 28: e2444001 sub r4, r4, #1 ; 0x1
> 2c: ebfffffe bl 0 <q>
> 30: e3740001 cmn r4, #1 ; 0x1
> 34: 1afffffb bne 28 <q+0x28>
> 38: e8bd8010 ldmia sp!, {r4, pc}
>
> I'm curious, what is the output of your compiler?
CSL 2007q3 and 2008q1 both generate this:
00000000 <f2>:
0: e92d4070 push {r4, r5, r6, lr}
4: e2505000 subs r5, r0, #0 ; 0x0
8: 08bd8070 popeq {r4, r5, r6, pc}
c: e3a04000 mov r4, #0 ; 0x0
10: e2844001 add r4, r4, #1 ; 0x1
14: ebfffffe bl 0 <q>
18: e1540005 cmp r4, r5
1c: 1afffffb bne 10 <f2+0x10>
20: e8bd8070 pop {r4, r5, r6, pc}
00000024 <f1>:
24: e3500001 cmp r0, #1 ; 0x1
28: e92d4070 push {r4, r5, r6, lr}
2c: e1a05000 mov r5, r0
30: 48bd8070 popmi {r4, r5, r6, pc}
34: e3a04000 mov r4, #0 ; 0x0
38: e2844001 add r4, r4, #1 ; 0x1
3c: ebfffffe bl 0 <q>
40: e1540005 cmp r4, r5
44: 1afffffb bne 38 <q+0x38>
48: e8bd8070 pop {r4, r5, r6, pc}
Laurent
More information about the ffmpeg-devel
mailing list