[FFmpeg-devel] [PATCH] Optimization of original IFF codec

Fri Apr 23 13:29:52 CEST 2010

Sebastian Vater <cdgs.basty at googlemail.com> writes:

> Hi a 38573583th time!
>
> M?ns Rullg?rd a ?crit :
>> I prefer to write just unsigned, saving a few keystrokes, but others
>> may disagree.  Do as you please.
>>   
> Fixed!
>>>      if (avctx->bits_per_coded_sample > 8) {
>>>          av_log(avctx, AV_LOG_ERROR, "bit_per_coded_sample > 8 not supported\n");
>>> @@ -52,8 +53,8 @@
>>>          av_log(avctx, AV_LOG_ERROR, "palette data underflow\n");
>>>          return AVERROR_INVALIDDATA;
>>>      }
>>> -    for (i=0; i < count; i++) {
>>> -        pal[i] = 0xFF000000 | AV_RB24( avctx->extradata + i*3 );
>>> +    for (i=0, j=0; i < count; i++, j += 3) {
>>> +        pal[i] = 0xFF000000 | AV_RB24( avctx->extradata + j);
>>>      }
>>>      return 0;
>>>  }
>>>     
>>
>> Did you check if this makes any difference in the generated code?  The
>> original is more readable.
>>   
> No, and I doubt that many compilers can do such stuff automatically.

Most of them can and do.

>>> @@ -71,7 +72,7 @@
>>>          return AVERROR_INVALIDDATA;
>>>      }
>>>  
>>> -    s->planesize = avctx->width / 8;
>>> +    s->planesize = avctx->width >> 3;
>>>      s->planebuf = av_malloc(s->planesize + FF_INPUT_BUFFER_PADDING_SIZE);
>>>      if (!s->planebuf)
>>>          return AVERROR(ENOMEM);
>>>     
>>
>> This is most definitely a number being divided.
>>   
> Fixed.

OTOH, I just noticed it's declared as signed for whatever reason...

>>> @@ -95,12 +96,16 @@
>>>   * @param plane plane number to decode as
>>>   */
>>>  #define DECLARE_DECODEPLANE(suffix, type) \
>>> -static void decodeplane##suffix(void *dst, const uint8_t *const buf, int buf_size, int bps, int plane) \
>>> +static inline void decodeplane##suffix(void *dst, \
>>> +                                       const uint8_t *const buf, \
>>> +                                       const unsigned int buf_size, \
>>> +                                       const unsigned int bps, \
>>> +                                       const unsigned int plane) \
>>>  { \
>>>      GetBitContext gb; \
>>> -    int i, b; \
>>> +    unsigned int i, b; \
>>>      init_get_bits(&gb, buf, buf_size * 8); \
>>> -    for(i = 0; i < (buf_size * 8 + bps - 1) / bps; i++) { \
>>> +    for(i = 0; i < ((buf_size << 3) + bps - 1) / bps; i++) { \
>>>          for (b = 0; b < bps; b++) { \
>>>              ((type *)dst)[ i*bps + b ] |= get_bits1(&gb) << plane; \
>>>          } \
>>>     
>>
>> Once again, did you check the generated code?  If it makes no
>> difference, the original code is better.
>>   
> I don't have to check the generated code to know that changing signed to
> unsigned will change signed multiplication to unsigned multiplication
> assembly instruction ;)

Yes, you do.  When the high half of the result is discarded, as with
32x32 -> 32 multiplication, there is no difference.  The compiler will
(hopefully) use whichever is faster.

> Also there are many architectures out there, which do unsigned
> multiplication faster than signed. This even is more true for
> divisions and if you look at the for body...

Divisions are much more evil than multiplications.  Multiplications
take a few cycles at most, while division often requires a function
call and at least 50 cycles.

In that loop, I'd double-check that gcc really computes the end
condition only once, not on every iteration of the loop.  Yes, I've
seen it do stupid things like that.

>> NEVER make changes like this without looking at the output
> Just have a knot in my brain, what was the way to extract disassembly
> from object files?

objdump -d

-- 
M?ns Rullg?rd
mans at mansr.com