[FFmpeg-devel] [PATCH] Optimization of original IFF codec
Måns Rullgård
mans
Fri Apr 23 13:29:52 CEST 2010
Sebastian Vater <cdgs.basty at googlemail.com> writes:
> Hi a 38573583th time!
>
> M?ns Rullg?rd a ?crit :
>> I prefer to write just unsigned, saving a few keystrokes, but others
>> may disagree. Do as you please.
>>
> Fixed!
>>> if (avctx->bits_per_coded_sample > 8) {
>>> av_log(avctx, AV_LOG_ERROR, "bit_per_coded_sample > 8 not supported\n");
>>> @@ -52,8 +53,8 @@
>>> av_log(avctx, AV_LOG_ERROR, "palette data underflow\n");
>>> return AVERROR_INVALIDDATA;
>>> }
>>> - for (i=0; i < count; i++) {
>>> - pal[i] = 0xFF000000 | AV_RB24( avctx->extradata + i*3 );
>>> + for (i=0, j=0; i < count; i++, j += 3) {
>>> + pal[i] = 0xFF000000 | AV_RB24( avctx->extradata + j);
>>> }
>>> return 0;
>>> }
>>>
>>
>> Did you check if this makes any difference in the generated code? The
>> original is more readable.
>>
> No, and I doubt that many compilers can do such stuff automatically.
Most of them can and do.
>>> @@ -71,7 +72,7 @@
>>> return AVERROR_INVALIDDATA;
>>> }
>>>
>>> - s->planesize = avctx->width / 8;
>>> + s->planesize = avctx->width >> 3;
>>> s->planebuf = av_malloc(s->planesize + FF_INPUT_BUFFER_PADDING_SIZE);
>>> if (!s->planebuf)
>>> return AVERROR(ENOMEM);
>>>
>>
>> This is most definitely a number being divided.
>>
> Fixed.
OTOH, I just noticed it's declared as signed for whatever reason...
>>> @@ -95,12 +96,16 @@
>>> * @param plane plane number to decode as
>>> */
>>> #define DECLARE_DECODEPLANE(suffix, type) \
>>> -static void decodeplane##suffix(void *dst, const uint8_t *const buf, int buf_size, int bps, int plane) \
>>> +static inline void decodeplane##suffix(void *dst, \
>>> + const uint8_t *const buf, \
>>> + const unsigned int buf_size, \
>>> + const unsigned int bps, \
>>> + const unsigned int plane) \
>>> { \
>>> GetBitContext gb; \
>>> - int i, b; \
>>> + unsigned int i, b; \
>>> init_get_bits(&gb, buf, buf_size * 8); \
>>> - for(i = 0; i < (buf_size * 8 + bps - 1) / bps; i++) { \
>>> + for(i = 0; i < ((buf_size << 3) + bps - 1) / bps; i++) { \
>>> for (b = 0; b < bps; b++) { \
>>> ((type *)dst)[ i*bps + b ] |= get_bits1(&gb) << plane; \
>>> } \
>>>
>>
>> Once again, did you check the generated code? If it makes no
>> difference, the original code is better.
>>
> I don't have to check the generated code to know that changing signed to
> unsigned will change signed multiplication to unsigned multiplication
> assembly instruction ;)
Yes, you do. When the high half of the result is discarded, as with
32x32 -> 32 multiplication, there is no difference. The compiler will
(hopefully) use whichever is faster.
> Also there are many architectures out there, which do unsigned
> multiplication faster than signed. This even is more true for
> divisions and if you look at the for body...
Divisions are much more evil than multiplications. Multiplications
take a few cycles at most, while division often requires a function
call and at least 50 cycles.
In that loop, I'd double-check that gcc really computes the end
condition only once, not on every iteration of the loop. Yes, I've
seen it do stupid things like that.
>> NEVER make changes like this without looking at the output
> Just have a knot in my brain, what was the way to extract disassembly
> from object files?
objdump -d
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list