[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans

Thu May 15 01:48:05 CEST 2008

On Wed, May 14, 2008 at 11:25:43PM +0100, Ramiro Polla wrote:
> Hello,
>
>>>> and id like to see benchmarks as well, so we can be sure this doesnt
>>>> slow the code down
>>> Adding custom permutation code for ff_alternate_vertical_scan the same 
>>> way it's done for ff_zigzag_direct gives these results (the source file 
>>> is properly cached in memory, 10 runs removing the highest and lowest):
>>>
>>> ./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4 
>>> -y -f rawvideo /dev/null
>>>
>>>       ref     new
>>> avg   2.7905  2.7390
>>> stdev 0.0318  0.0287
>> I do not care at all about alternate_scan speed. I care about 
>> zigzag_direct
>> speed! Its whats used 99% of the time.
>> even a 0.1% speedloss for zigzag means rejected patch!
>
> Tested with a bigger cif sample.
>
>       ref     new
> avg   7.9774  7.9652
> stdev 0.0433  0.0396
>
> I ran the tests on init 1 with everything that I could stop (no network, no 
> log daemon, no fs journaling, no cron, no bunch of daemons...), and 
> couldn't get the stdev to be below the 0.1% you want.

:(

what values does:
for(i=0; i<N; i++)
    for(j=0; j<N; j++){
        if(time_ref[i] < time_new[j])
            ref_better++;
        if(time_ref[i] > time_new[j])
            new_better++;
    }

result in?

>
> Benchmarking with START/STOP_TIMER isn't very good since the runs can vary 
> on the time they take depending on last_non_zero. Also the patch changes 
> not only the MMX code but removes the hack in mpegvideo_enc.c.

*_TIMER around the macroblock encode function seems an option
also the automatic threshold calculation could be replaced by a constant
to avoid excessive skips.

>
> The changes for direct zigzag are basically:
> - read inverse from context instead of from static array. (shouldn't slow 
> anything down)
> - remove an if(s->alternate_scan) from mpegvideo_enc.c
> - add an if(s->intra_scantable.scantable == ff_zigzag_direct) to MMX code.

ill probably ok this if the benchmarks are inconclusive ...

>
> I think this is a good solution that shows no measurable speed loss. I can 
> test more if someone knows a way to make incredibly accurate benchmarking 
> (<0.1% stdev).
>
> We could also have two functions of each, one hardcoded to direct zigzag 
> and another more generic, that can be set in MPV_common_init_mmx() (under 
> #ifdef CONFIG_SMALL). I attached an example of what could be done. Another 
> way would be if my first patch is accepted, have an av_always_inline 
> dct_quantize_xxx_template(..., direct_zigzag), and create 
> dct_quantize_xxx_direct and dct_quantize_xxx_normal.

Iam not in favor of the extra complexity.
At least not without clear benchmarks & object file sizes showing that this
is worth it ...
Also as already said zigzag_direct is used 99% of the time so the other
cases simply do not look very relevant to me ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Old school: Use the lowest level language in which you can solve the problem
            conveniently.
New school: Use the highest level language in which the latest supercomputer
            can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080515/0a0b5d76/attachment.pgp>