[FFmpeg-devel] [PATCH] h264.c/decode_cabac_residual optimization
Måns Rullgård
mans
Wed Jul 2 00:00:28 CEST 2008
"Siarhei Siamashka" <siarhei.siamashka at gmail.com> writes:
> On Tue, Jul 1, 2008 at 9:44 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Tue, Jul 01, 2008 at 01:14:56PM -0400, Alexander Strange wrote:
>> [...]
>>
>>> diff -ru --exclude='*svn*' ffmpeg-/libavcodec/h264.c ffmpeg/libavcodec/h264.c
>>> --- ffmpeg-/libavcodec/h264.c 2008-06-30 14:47:53.000000000 -0400
>>> +++ ffmpeg/libavcodec/h264.c 2008-06-30 14:47:59.000000000 -0400
>>> @@ -5517,7 +5517,7 @@
>>> }
>>> }
>>>
>>> - for( coeff_count--; coeff_count >= 0; coeff_count-- ) {
>>> + while( coeff_count-- ) {
>>> uint8_t *ctx = coeff_abs_level1_ctx[node_ctx] + abs_level_m1_ctx_base;
>>>
>>> int j= scantable[index[coeff_count]];
>>
>> ok if faster or same speed
>
> Typically pre-decrement is always preferred in code optimized for
> performance as it is generally faster. Something like this would be
> better (also it is closer to the old code):
> while( --coeff_count >= 0 ) {
> ...
> }
>
> You can try to compile this sample with the best possible
> optimizations, look at the assembly output and check where the
> generated code is better and why:
>
> /**********************/
> int q();
>
> void f1(int n)
> {
> while (--n >= 0) {
> q();
> }
> }
>
> void f2(int n)
> {
> while (n--) {
> q();
> }
> }
> /**********************/
Any half-decent compiler should generate the same code for those two
functions. GCC for ARM generates a slightly different, but
equivalent, setup sequence, and the loops are exactly the same.
I can't be bothered to check x86.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list