[FFmpeg-devel] [PATCH] Make VP3/Theora Decoder Much Faster
Jason Garrett-Glaser
darkshikari
Wed Dec 2 09:41:25 CET 2009
Another optimization patch attached. Should be pretty obvious what it does.
Also, a few optimization targets for Mike:
1. unpack_vectors is atrociously inefficient crap. It makes up 10%
of decoding time and can be made at least twice as fast.
2. The fragment/superblock index error checking all over the place
seems redundant and likely prevents significant future optimizations.
3. There's no asm version of put_no_rnd_pixels8_l2 ...
4. Motion vector handling seems to be done in a very silly fashion,
with all 16x16 partitions being treated as groups of 8x8 partitions.
This accordingly prevents faster 16x16 motion compensation functions
from being used in 16x16 partitions, despite 8x8 partitions being
rarer.
5. There's a huge amount of if(x>0) and if(y>0) and if(x<width-1) and
so forth. Why not just pad the edges of these data structures and
eliminate the conditionals all over the place?
I can think of more, but I'm lazy. Also, here's a profile with
-fno-inline-functions and -fno-inline-functions-called-once, on a Core
i7 (from before my changes, but after Mike's):
4914 22.3709 unpack_vlcs
2392 10.8896 reverse_dc_prediction
2102 9.5693 render_slice
2031 9.2461 unpack_vectors
2008 9.1414 ff_vp3_idct_put_sse2
1397 6.3598 ff_vp3_h_loop_filter_mmx2
1096 4.9895 vp3_decode_frame
959 4.3658 ff_vp3_idct_add_sse2
873 3.9743 put_pixels8_mmx
867 3.9470 ff_vp3_v_loop_filter_mmx2
782 3.5600 unpack_superblocks
525 2.3901 put_no_rnd_pixels8_l2_c
512 2.3309 apply_loop_filter
370 1.6844 unpack_modes
278 1.2656 add_pixels_clamped_mmx
251 1.1427 put_signed_pixels_clamped_mmx
182 0.8286 put_no_rnd_pixels8_y2_mmx2
160 0.7284 put_no_rnd_pixels8_x2_mmx2
112 0.5099 clear_block_sse
76 0.3460 ff_emulated_edge_mc
Dark Shikari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vp3opts.diff
Type: application/octet-stream
Size: 7319 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091202/0ca54b0a/attachment.obj>
More information about the ffmpeg-devel
mailing list