[Ffmpeg-devel] VP3/Theora Perfection
Måns Rullgård
mru
Thu May 19 12:41:22 CEST 2005
Mike Melanson <mike at multimedia.cx> writes:
> Hi,
> I have replaced unpack_token() with a series of lookup tables
> in vp3.c. Now vp3data.h has more lines than vp3.c. Again,
> please test as I do not have great testing facilities right
> now. However, I did run a series of tests that validated a
> bunch of decoded tokens against the old function.
>
> Numbers for the speed freaks:
>
> [original]
> 1223 dezicycles in unpack_token, 32757 runs, 11 skips
> 1202 dezicycles in unpack_token, 65512 runs, 24 skips
> [new]
> 845 dezicycles in unpack_token, 32735 runs, 33 skips
> 841 dezicycles in unpack_token, 65466 runs, 70 skips
>
> What should I optimize next?
Perhaps some profiling data can give some hints:
samples % image name symbol name
79906 20.6758 libc-2.3.4.so (no symbols)
64232 16.6201 libavcodec-0.4.9-pre1.so apply_loop_filter
62827 16.2566 libavcodec-0.4.9-pre1.so unpack_vlcs
58066 15.0247 libavcodec-0.4.9-pre1.so render_fragments
26620 6.8880 libavcodec-0.4.9-pre1.so put_pixels8_mmx
21309 5.5137 libavcodec-0.4.9-pre1.so ff_vp3_idct_sse2
18442 4.7719 libavcodec-0.4.9-pre1.so reverse_dc_prediction
8021 2.0754 libavcodec-0.4.9-pre1.so unpack_superblocks
6187 1.6009 libavcodec-0.4.9-pre1.so __udivdi3
5489 1.4203 libavcodec-0.4.9-pre1.so unpack_vectors
5093 1.3178 libavcodec-0.4.9-pre1.so unpack_modes
4986 1.2901 libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_l2_c
4801 1.2423 libavcodec-0.4.9-pre1.so vp3_decode_frame
4523 1.1703 libavcodec-0.4.9-pre1.so ff_vp3_idct_add_sse2
2342 0.6060 libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_y2_mmx2
2094 0.5418 libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_x2_mmx2
Any idea what is being called in libc? I guess it's memcpy and/or
memset.
--
M?ns Rullg?rd
mru at inprovide.com
More information about the ffmpeg-devel
mailing list