[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans
Michael Niedermayer
michaelni
Thu May 15 01:48:05 CEST 2008
On Wed, May 14, 2008 at 11:25:43PM +0100, Ramiro Polla wrote:
> Hello,
>
>>>> and id like to see benchmarks as well, so we can be sure this doesnt
>>>> slow the code down
>>> Adding custom permutation code for ff_alternate_vertical_scan the same
>>> way it's done for ff_zigzag_direct gives these results (the source file
>>> is properly cached in memory, 10 runs removing the highest and lowest):
>>>
>>> ./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4
>>> -y -f rawvideo /dev/null
>>>
>>> ref new
>>> avg 2.7905 2.7390
>>> stdev 0.0318 0.0287
>> I do not care at all about alternate_scan speed. I care about
>> zigzag_direct
>> speed! Its whats used 99% of the time.
>> even a 0.1% speedloss for zigzag means rejected patch!
>
> Tested with a bigger cif sample.
>
> ref new
> avg 7.9774 7.9652
> stdev 0.0433 0.0396
>
> I ran the tests on init 1 with everything that I could stop (no network, no
> log daemon, no fs journaling, no cron, no bunch of daemons...), and
> couldn't get the stdev to be below the 0.1% you want.
:(
what values does:
for(i=0; i<N; i++)
for(j=0; j<N; j++){
if(time_ref[i] < time_new[j])
ref_better++;
if(time_ref[i] > time_new[j])
new_better++;
}
result in?
>
> Benchmarking with START/STOP_TIMER isn't very good since the runs can vary
> on the time they take depending on last_non_zero. Also the patch changes
> not only the MMX code but removes the hack in mpegvideo_enc.c.
*_TIMER around the macroblock encode function seems an option
also the automatic threshold calculation could be replaced by a constant
to avoid excessive skips.
>
> The changes for direct zigzag are basically:
> - read inverse from context instead of from static array. (shouldn't slow
> anything down)
> - remove an if(s->alternate_scan) from mpegvideo_enc.c
> - add an if(s->intra_scantable.scantable == ff_zigzag_direct) to MMX code.
ill probably ok this if the benchmarks are inconclusive ...
>
> I think this is a good solution that shows no measurable speed loss. I can
> test more if someone knows a way to make incredibly accurate benchmarking
> (<0.1% stdev).
>
> We could also have two functions of each, one hardcoded to direct zigzag
> and another more generic, that can be set in MPV_common_init_mmx() (under
> #ifdef CONFIG_SMALL). I attached an example of what could be done. Another
> way would be if my first patch is accepted, have an av_always_inline
> dct_quantize_xxx_template(..., direct_zigzag), and create
> dct_quantize_xxx_direct and dct_quantize_xxx_normal.
Iam not in favor of the extra complexity.
At least not without clear benchmarks & object file sizes showing that this
is worth it ...
Also as already said zigzag_direct is used 99% of the time so the other
cases simply do not look very relevant to me ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Old school: Use the lowest level language in which you can solve the problem
conveniently.
New school: Use the highest level language in which the latest supercomputer
can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080515/0a0b5d76/attachment.pgp>
More information about the ffmpeg-devel
mailing list