[Ffmpeg-devel] VP3/Theora Perfection
Måns Rullgård
mru
Mon May 16 22:58:21 CEST 2005
Michael Niedermayer <michaelni at gmx.at> writes:
> Hi
>
> On Monday 16 May 2005 15:30, Mike Melanson wrote:
>> Diego Biurrun wrote:
>> > What samples are you using to test? Your last commit fixed vp31.avi and
>> > all the other samples on mphq decode flawlessly now, albeit the FFmpeg
>> > decoder takes 2-3 times the CPU of the binary decoder.
>>
>> Not surprising. The loop filter is quite computationally intensive
>> (32-64 new multiplications per coded fragment). The original On2 source
>
> u mean the 32 *1 and 32 *3 ones? *3 is just x+x+x or x+(x<<1) and gcc will
> change this for you
>
> anyway, vp3.c is very inefficiently written
Some profiling data that may be interesting:
samples % image name symbol name
79102 17.5661 libc-2.3.4.so (no symbols)
70556 15.6683 libavcodec-0.4.9-pre1.so apply_loop_filter
55903 12.4143 libavcodec-0.4.9-pre1.so unpack_vlcs
50236 11.1558 libavcodec-0.4.9-pre1.so vp3_idct_sse2
41615 9.2414 libavcodec-0.4.9-pre1.so vp3_decode_frame
33667 7.4764 libavcodec-0.4.9-pre1.so put_pixels8_mmx
29593 6.5717 libavcodec-0.4.9-pre1.so render_fragments
24284 5.3927 libavcodec-0.4.9-pre1.so reverse_dc_prediction
13132 2.9162 libavcodec-0.4.9-pre1.so unpack_superblocks
12537 2.7841 libavcodec-0.4.9-pre1.so unpack_token
6847 1.5205 libavcodec-0.4.9-pre1.so unpack_modes
5422 1.2041 libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_l2_c
4502 0.9998 libavcodec-0.4.9-pre1.so add_pixels_clamped_mmx
4317 0.9587 libavcodec-0.4.9-pre1.so unpack_vectors
2732 0.6067 libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_y2_mmx2
2279 0.5061 libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_x2_mmx2
1472 0.3269 libavcodec-0.4.9-pre1.so ff_emulated_edge_mc
The libc usage is probably memcpy or memset, so I'd go looking for
unnecessary uses of those. As expected, apply_loop_filter is using a
lot of time, as is unpack_vlcs.
--
M?ns Rullg?rd
mru at inprovide.com
More information about the ffmpeg-devel
mailing list