[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans
Ramiro Polla
ramiro
Fri May 16 01:55:03 CEST 2008
Hi,
Michael Niedermayer wrote:
> On Wed, May 14, 2008 at 11:25:43PM +0100, Ramiro Polla wrote:
>> Hello,
>>
>>>>> and id like to see benchmarks as well, so we can be sure this doesnt
>>>>> slow the code down
>>>> Adding custom permutation code for ff_alternate_vertical_scan the same
>>>> way it's done for ff_zigzag_direct gives these results (the source file
>>>> is properly cached in memory, 10 runs removing the highest and lowest):
>>>>
>>>> ./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4
>>>> -y -f rawvideo /dev/null
>>>>
>>>> ref new
>>>> avg 2.7905 2.7390
>>>> stdev 0.0318 0.0287
>>> I do not care at all about alternate_scan speed. I care about
>>> zigzag_direct
>>> speed! Its whats used 99% of the time.
>>> even a 0.1% speedloss for zigzag means rejected patch!
>> Tested with a bigger cif sample.
>>
>> ref new
>> avg 7.9774 7.9652
>> stdev 0.0433 0.0396
>>
>> I ran the tests on init 1 with everything that I could stop (no network, no
>> log daemon, no fs journaling, no cron, no bunch of daemons...), and
>> couldn't get the stdev to be below the 0.1% you want.
>
> :(
>
> what values does:
> for(i=0; i<N; i++)
> for(j=0; j<N; j++){
> if(time_ref[i] < time_new[j])
> ref_better++;
> if(time_ref[i] > time_new[j])
> new_better++;
> }
>
> result in?
new tests with:
./ffmpeg_g -v 2 -s 352x288 -i paris.cif -vcodec mpeg1video -f rawvideo
-y /dev/null
in init 1 with lots of stuff stopped.
ref_better: 15
new_better: 49
ref new
avg 7.3245 7.2870
stdev 0.0457 0.0342
>> Benchmarking with START/STOP_TIMER isn't very good since the runs can vary
>> on the time they take depending on last_non_zero. Also the patch changes
>> not only the MMX code but removes the hack in mpegvideo_enc.c.
>
> *_TIMER around the macroblock encode function seems an option
> also the automatic threshold calculation could be replaced by a constant
> to avoid excessive skips.
I added *_TIMERs in first and last line of encode_mb_internal. These are
the results:
around 1048576 runs
ref new
time skips time skips
42612 3033 44422 2889
42551 2981 43440 2712
42553 2846 43711 2991
41825 2895 44401 3036
41812 2925 43691 2878
41792 2858 43137 2873
42458 2973 43482 2945
42388 3078 43553 2766
42366 2888 43450 2987
42721 3003 43163 2996
42210 2964 43956 2954
42299 43674 avg
341.8448 433.0284 stdev
>> The changes for direct zigzag are basically:
>> - read inverse from context instead of from static array. (shouldn't slow
>> anything down)
>> - remove an if(s->alternate_scan) from mpegvideo_enc.c
>> - add an if(s->intra_scantable.scantable == ff_zigzag_direct) to MMX code.
>
> ill probably ok this if the benchmarks are inconclusive ...
>
>
>
>> I think this is a good solution that shows no measurable speed loss. I can
>> test more if someone knows a way to make incredibly accurate benchmarking
>> (<0.1% stdev).
>>
>> We could also have two functions of each, one hardcoded to direct zigzag
>> and another more generic, that can be set in MPV_common_init_mmx() (under
>> #ifdef CONFIG_SMALL). I attached an example of what could be done. Another
>> way would be if my first patch is accepted, have an av_always_inline
>> dct_quantize_xxx_template(..., direct_zigzag), and create
>> dct_quantize_xxx_direct and dct_quantize_xxx_normal.
>
> Iam not in favor of the extra complexity.
> At least not without clear benchmarks & object file sizes showing that this
> is worth it ...
> Also as already said zigzag_direct is used 99% of the time so the other
> cases simply do not look very relevant to me ...
Is this patch is not accepted, is it ok to duplicate the hack to check
for last_nonzero_coeff in the mimic encoder? If not, what other way
around do you suggest?
Ramiro Polla
More information about the ffmpeg-devel
mailing list