[FFmpeg-devel] [PATCH] Mimic decoder

Mon Mar 17 20:01:05 CET 2008

Hello Michael,

Michael Niedermayer wrote:
> On Sun, Mar 16, 2008 at 04:15:41PM -0300, Ramiro Polla wrote:
> [...]
>>> [...]
>>>>>>> and
>>>>>>> value= 1 - (1<<num_bits);
>>>>>>> if(num_bits>1)
>>>>>>>     value+= get_bits(num_bits-1);
>>>>>>> if(get_bits1())
>>>>>>>     value -= value;
>>>>>>> block[ctx->scantable.permutated[pos]] = value;
>>>>>>> Or if you think the table matters speedwise then make it static dont
>>>>>>> duplicate it in every decoder instance.
>>>>>>           ref   calc  static
>>>>>> mimic.o : 51868 51304 52192
>>>>>> avg time: 1.692 1.712 1.728
>>>>>>
>>>>>> That's very odd... Shouldn't the static table be at least as fast as 
>>>>>> the
>>>>>> current code? 
>>>>> Yes
>>>>> If you want to debug it the asm output from gcc would be a start but i
>>>>> doubt its worth the time spend.
>>>> With a bigger sample, I got
>>>>           ref   calc  static
>>>> avg time: 6.688 6.932 6.744
>>> And the standard deviations? Mean is only one side of the story ...
>> The more tests I run, the more innacurate they become.
>>
>>           ref      calc     static
>> avg time: 6.819993 6.873992 6.823993
>> std dev : 0.017436 0.022716 0.022627
>>
>> What's the best way to run benchmarks?
> 
> Id try START/STOP_TIMER around the function you try to benchmark but
> watch the skip number if the amount of time spend in the function can vary
> alot.
> 
> Anyway the difference between ref and static is considerably less than the
> standard deviation so i think the argument that its faster than static is
> about as likely as the opposit argument.
> 
> 
>> With no X, I run 20 runs of
>> ./ffmpeg_g -i ../data/y.ml20 -vcodec rawvideo -f rawvideo -r 5 -y /dev/null
>>
>> Is there a way to just benchmark demuxing+decoding without anything else? 
>> (besides writing my own simple program).
>>
>> The input video has less than 5 frames per second, so I use -r 5 to avoid 
>> outputting 1000 fps.
>>
>>>>>> Attached are the 2 patches I used to test. (lut_calc.diff and 
>>>>>> lut_static.diff). Anyways, most FFmpeg stuff introduced a considerable 
>>>>>> speedup over the original code, so I'm tempted to apply one of these 
>>>>>> two.
>>>>>> Could I use the hardcoded tables ifdef there?
>>>>> I dont see much sense in it. I mean the calc code doesnt need the table 
>>>>> at
>>>>> all and needs around 100 byte while the table needs 1024.
>>>> If it's ok I'd prefer to keep the current code with vlcdec_lookup in 
>>>> MimicContext.
>>> You can keep it if you merge the *qscale into the table as its not
>>> const then anymore ...
>> I don't understand how. qscale depends on quality, which can change per 
>> frame.
> 
> Recalculate the table when it changed.
> 
> 
>> qscale also depends on pos...
> 
> you could have a second table
> 
> anyway thats just a suggestion, i am not saying that it would be a good idea
> what i am saying is that having a const table in the context is a bad idea.

Well, the values can be be different for chroma and luma samples as they 
have a different min value for qscale. So I'd need to keep at least two 
tables, and also pass that to vlc_decode_block. The tables would need to 
change when quality!=ctx->quality and I'd need to keep track of quality. 
Then there's the fact that pos 1 and 2 don't use qscale but rather a 
fixed <<. Then I'd need more tables for that, or another way to work 
around it...

Made static, finally. It's simpler. =)

> [...]
> 
>>     for(plane = 0; plane < 3; plane++) {
>>         const int is_chroma = !!plane;
>>         const int qscale = av_clip(10000-quality,is_chroma?1000:2000,10000)<<2;
>>         const int stride = cur_frame.linesize[plane];
>>         const uint8_t *base_src = prev_frame.data[plane];
>>         uint8_t *base_dst = cur_frame. data[plane];
>                                         ^
> random space?

Removed. At some revision it used to make things look cleaner...

> [...]
>>             ctx->num_vblocks[i] = height >> (3 + !!i);
>>             ctx->num_hblocks[i] = width  >> (3 + !!i);
>>             if(i && height & 15)
>>                 ctx->num_vblocks[i]++;
> 
> ctx->num_vblocks[i] = -((-height) >> (3 + !!i));

Hmm... wouldn't that be only if height was negative? It is always 
positive in the data.

>>         }
>>     } else
>>     if(width != ctx->avctx->width || height != ctx->avctx->height) {
>>         av_log(avctx, AV_LOG_ERROR, "resolution changing is not supported\n");
>>         return -1;
>>     }
>>
> 
>>     ctx->picture.data[0] = NULL;
>>     ctx->picture.reference = 1;
>>     if(avctx->get_buffer(avctx, &ctx->picture)) {
> 
> the = NULL before get_buffer() looks very suspicious, you arent forgetting to
> release them somewhere?

The decoder always get_buffer for a new picture at the start of 
decode_frame. That frame is kept to be used as back-reference for 
another 16 frames. At the end of decode_frame, it releases the last 
frame if it's no longer needed. decode_end then releases all frames.

Ramiro Polla
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mimic.c
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080317/8ae10946/attachment.asc>