[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans

Fri May 16 01:55:03 CEST 2008

Hi,

Michael Niedermayer wrote:
> On Wed, May 14, 2008 at 11:25:43PM +0100, Ramiro Polla wrote:
>> Hello,
>>
>>>>> and id like to see benchmarks as well, so we can be sure this doesnt
>>>>> slow the code down
>>>> Adding custom permutation code for ff_alternate_vertical_scan the same 
>>>> way it's done for ff_zigzag_direct gives these results (the source file 
>>>> is properly cached in memory, 10 runs removing the highest and lowest):
>>>>
>>>> ./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4 
>>>> -y -f rawvideo /dev/null
>>>>
>>>>       ref     new
>>>> avg   2.7905  2.7390
>>>> stdev 0.0318  0.0287
>>> I do not care at all about alternate_scan speed. I care about 
>>> zigzag_direct
>>> speed! Its whats used 99% of the time.
>>> even a 0.1% speedloss for zigzag means rejected patch!
>> Tested with a bigger cif sample.
>>
>>       ref     new
>> avg   7.9774  7.9652
>> stdev 0.0433  0.0396
>>
>> I ran the tests on init 1 with everything that I could stop (no network, no 
>> log daemon, no fs journaling, no cron, no bunch of daemons...), and 
>> couldn't get the stdev to be below the 0.1% you want.
> 
> :(
> 
> what values does:
> for(i=0; i<N; i++)
>     for(j=0; j<N; j++){
>         if(time_ref[i] < time_new[j])
>             ref_better++;
>         if(time_ref[i] > time_new[j])
>             new_better++;
>     }
> 
> result in?

new tests with:
./ffmpeg_g -v 2 -s 352x288 -i paris.cif -vcodec mpeg1video -f rawvideo 
-y /dev/null

in init 1 with lots of stuff stopped.

ref_better: 15
new_better: 49

       ref     new
avg   7.3245  7.2870
stdev 0.0457  0.0342

>> Benchmarking with START/STOP_TIMER isn't very good since the runs can vary 
>> on the time they take depending on last_non_zero. Also the patch changes 
>> not only the MMX code but removes the hack in mpegvideo_enc.c.
> 
> *_TIMER around the macroblock encode function seems an option
> also the automatic threshold calculation could be replaced by a constant
> to avoid excessive skips.

I added *_TIMERs in first and last line of encode_mb_internal. These are 
the results:

around 1048576 runs
ref          new
time   skips time   skips
42612  3033  44422  2889
42551  2981  43440  2712
42553  2846  43711  2991
41825  2895  44401  3036
41812  2925  43691  2878
41792  2858  43137  2873
42458  2973  43482  2945
42388  3078  43553  2766
42366  2888  43450  2987
42721  3003  43163  2996
42210  2964  43956  2954
42299        43674        avg
341.8448     433.0284     stdev

>> The changes for direct zigzag are basically:
>> - read inverse from context instead of from static array. (shouldn't slow 
>> anything down)
>> - remove an if(s->alternate_scan) from mpegvideo_enc.c
>> - add an if(s->intra_scantable.scantable == ff_zigzag_direct) to MMX code.
> 
> ill probably ok this if the benchmarks are inconclusive ...
> 
> 
> 
>> I think this is a good solution that shows no measurable speed loss. I can 
>> test more if someone knows a way to make incredibly accurate benchmarking 
>> (<0.1% stdev).
>>
>> We could also have two functions of each, one hardcoded to direct zigzag 
>> and another more generic, that can be set in MPV_common_init_mmx() (under 
>> #ifdef CONFIG_SMALL). I attached an example of what could be done. Another 
>> way would be if my first patch is accepted, have an av_always_inline 
>> dct_quantize_xxx_template(..., direct_zigzag), and create 
>> dct_quantize_xxx_direct and dct_quantize_xxx_normal.
> 
> Iam not in favor of the extra complexity.
> At least not without clear benchmarks & object file sizes showing that this
> is worth it ...
> Also as already said zigzag_direct is used 99% of the time so the other
> cases simply do not look very relevant to me ...

Is this patch is not accepted, is it ok to duplicate the hack to check 
for last_nonzero_coeff in the mimic encoder? If not, what other way 
around do you suggest?

Ramiro Polla