[FFmpeg-devel] Once again: Multithreaded H.264 decoding with ffmpeg?
Jason Garrett-Glaser
darkshikari
Fri May 30 07:52:29 CEST 2008
>> I have been looking into the h264 code and each piece of H.264
>> documentation I could get my hands on. And I have the impression that
>> some of the decoding steps (namely residual decoding, deblocking) could
>> be parallelized quite well. But I don't have any idea how much time the
>> individual decoding steps take. Does someone happen to have some
>> numbers? Or a hint how to measure this myself?
[Profile courtesy of Loren Merritt]
ffh264 svn-r11870 (2008-02-04)
CPU: Core 2, speed 2400.75 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
unit mask of 0x00 (Unhalted core cycles) count 100000
samples % symbol name
168093 9.2010 decode_mb_cabac
165494 9.0587 decode_cabac_residual
133817 7.3248 fill_caches
115161 6.3036 hl_decode_mb_simple
111744 6.1166 h264_#_loop_filter_luma_mmx2
101511 5.5565 put_h264_chroma_mc8_mmx
88055 4.8199 h264_#_loop_filter_chroma_mmx2
72618 3.9749 filter_mb_fast
70919 3.8819 get_cabac_noinline
67392 3.6889 put_h264_qpel8_h_lowpass_l2_mmx2
66962 3.6653 put_h264_qpel8or16_v_lowpass_mmx2
64123 3.5100 filter_mb_edge#
53187 2.9113 put_h264_qpel16_mc##_mmx2
52559 2.8770 h264_loop_filter_strength_mmx2
47997 2.6272 decode_cabac_mb_mvd
42554 2.3293 decode_mb_skip
39814 2.1793 mc_dir_part
39220 2.1468 hl_motion
35509 1.9437 clear_blocks_mmx
33781 1.8491 prefetch_mmx2
32510 1.7795 put_h264_chroma_mc4_mmx
32389 1.7729 put_h264_qpel8or16_hv_lowpass_mmx2
25840 1.4144 pred_direct_motion
23767 1.3009 put_h264_qpel8or16_vh_lowpass_mmx2
14891 0.8151 ff_h264_idct8_add_sse2
14235 0.7792 decode_slice
12522 0.6854 put_h264_qpel8_h_lowpass_mmx2
10993 0.6014 put_h264_qpel8_mc##_mmx2
9992 0.5469 decode_cabac_mb_skip
9958 0.5451 avg_h264_qpel8_h_lowpass_l2_mmx2
8960 0.4905 pred8x8l_#
6585 0.3604 filter_mb
6516 0.3567 ff_h264_idct_dc_add_mmx2
6290 0.3443 draw_edges_mmx
6102 0.3340 mc_part
4565 0.2499 put_pixels8_l2_shift5_mmx2
4108 0.2249 ff_h264_biweight_#x#_mmx2
3413 0.1868 ff_h264_idct8_dc_add_mmx2
3253 0.1781 pred8x8c_#
3159 0.1729 ff_h264_idct_add_mmx
2809 0.1538 avg_h264_qpel8_h_lowpass_mmx2
2210 0.1210 avg_h264_qpel8or16_v_lowpass_mmx2
2179 0.1193 avg_h264_qpel8or16_hv_lowpass_mmx2
1667 0.0912 decode_nal_units
1552 0.0851 pred4x4_#
891 0.0488 decode_cabac_intra_mb_type
761 0.0417 avg_pixels8_l2_shift5_mmx2
528 0.0289 pred16x16_#
376 0.0206 decode_slice_header
304 0.0166 ff_emulated_edge_mc
239 0.0131 h264_luma_dc_dequant_idct_c
224 0.0123 decode_frame
165 0.0090 MPV_frame_start
133 0.0073 ff_draw_horiz_band
124 0.0068 video_read_frame
119 0.0065 fill_default_ref_list
108 0.0059 handle_block
102 0.0056 fast_memcpy
97 0.0053 decode_ref_pic_list_reordering
84 0.0046 ff_init_cabac_states
Dark Shikari
More information about the ffmpeg-devel
mailing list