[FFmpeg-devel] [PATCH] h264 parallelized
Andreas Öman
andreas
Sun Sep 2 09:50:01 CEST 2007
Michael,
Michael Niedermayer wrote:
> Hi
>
>> ive tried another file (Aladin.mpg 995 frames 352x240, the other file
>> was 538 frames 160x128)
>> svn : 0m10.828s, 0m10.777s, 0m10.848s, 0m10.799s, 0m10.742s avg:10.799
>> patch: 0m10.770s, 0m10.777s, 0m10.831s, 0m10.918s, 0m10.778s avg:10.815
>>
>> ill do more tests
>
> ive tried the first file concatenated 5 times:
> 0m3.669s, 0m3.696s, 0m3.674s, 0m3.700s, 0m3.724s avg:3.693
> 0m3.781s, 0m3.782s, 0m3.770s, 0m3.797s, 0m3.776s avg:3.781
>
> this should exclude any once run init code as a possible cause
>
I'm stumbling a bit around the problem here and not really able to
reproduce the slowdown on any of my systems. It's actually mostly
faster with the patch.
10 rounds of decoding (without audio), user time:
Aladin.mpg:
Intel(R) Pentium(R) M processor 1.73GHz
unmodified: avg: 2.658 stddev: 0.053 med: 2.672
patched: avg: 2.673 stddev: 0.014 med: 2.676
AMD Sempron(tm) Processor 2800+
unmodified: avg: 3.670 stddev: 0.033 med: 3.670
patched: avg: 3.511 stddev: 0.055 med: 3.500
apple zodiac trailer:
Intel(R) Pentium(R) M processor 1.73GHz
unmodified: avg: 67.354 stddev: 0.132 med: 67.370
patched: avg: 66.801 stddev: 0.371 med: 66.642
AMD Sempron(tm) Processor 2800+
unmodified: avg: 78.481 stddev: 0.543 med: 78.485
patched: avg: 76.089 stddev: 0.293 med: 76.090
All tests has been run under a vanilla ./configure build.
I've ran tests with valgrind's cachegrind -> cant see any difference.
gprofing wont really compile with optimized cabac-support (7regs
conflicts with function instrumentation). But then again, i'm
not even able to reproduce the slowdown so i doubt it would
give me any usable feedback.
Looking at symbol sizes with nm there is not much difference
either, see below.
I've tried to rearrange the added functions to see if there
is any inlineing issues, but there is not much speed change.
If you (or anyone else) have any ideas I'd be happy to hear them :-)
Otherwise, i'll just have to drop the patch on the floor.
(Or let it linger till i come up with some idea, or stumble across
a machine where it slows down)
--- /tmp/unmodified.symbols 2007-09-02 09:37:07.000000000 +0200
+++ /tmp/patched.symbols 2007-09-02 09:37:01.000000000 +0200
@@ -1,4 +1,4 @@
-00000315 t alloc_tables
+000002b5 t alloc_tables
0000009c r alpha_table
0000005c r b_mb_type_info
00000034 r b_sub_mb_type_info
@@ -14,26 +14,27 @@
0000000c r chroma_dc_total_zeros_len
00000030 b chroma_dc_total_zeros_vlc
00000034 r chroma_qp
+0000012f t clone_slice
00000110 r coeff_token_bits
00000110 r coeff_token_len
00000040 b coeff_token_vlc
000001d1 t decode_cabac_intra_mb_type
00000704 t decode_cabac_mb_mvd
00001233 t decode_cabac_residual
00000040 t decode_end
000015f2 t decode_frame
-00000f3c t decode_init
-000075f7 t decode_mb_cabac
-00006103 t decode_mb_cavlc
+00000f49 t decode_init
+000075b7 t decode_mb_cabac
+00006137 t decode_mb_cavlc
00000acf t decode_mb_skip
-00001667 t decode_nal_units
+00001a7e t decode_nal_units
00000e8a t decode_ref_pic_list_reordering
0000088b t decode_residual
00000d8b t decode_scaling_matrices
00001a97 t decode_seq_parameter_set
00000706 t decode_slice
-0000384a t decode_slice_header
+00002b84 t decode_slice_header
00000020 r default_scaling4
00000080 r default_scaling8
00000012 r dequant4_coeff_init
@@ -60,15 +61,17 @@
00000010 r field_scan
00000040 r field_scan8x8
00000040 r field_scan8x8_cavlc
-00001fee t fill_caches
+00001ffe t fill_caches
+00000f75 t fill_default_ref_list
+00000240 t fill_mbaff_ref_list
00001e1c t filter_mb
000003a0 t filter_mb_edgeh
000002bf t filter_mb_edgev
00002651 t filter_mb_fast
00000202 t filter_mb_mbaff_edgecv
000002da t flush_dpb
-000002e1 t frame_start
-0000010f t free_tables
+000002fe t frame_start
+00000145 t free_tables
000000ae t get_cabac_noinline
00000030 r golomb_to_inter_cbp
00000030 r golomb_to_intra4x4_cbp
@@ -76,10 +79,11 @@
00000034 D h264_decoder
000002f6 t h264_luma_dc_dequant_idct_c
0000270f t hl_decode_mb_complex
-0000124b t hl_decode_mb_simple
+00001244 t hl_decode_mb_simple
00000c87 t hl_motion
00000068 r i_mb_type_info
00000613 t init_dequant_tables
+000004cb t init_scan_tables
0000003f r last_coeff_flag_offset_8x8
00000010 r luma_dc_zigzag_scan
00001e0f t mc_part
@@ -116,7 +120,7 @@
0000072c t svq3_add_idct_c
00000040 r svq3_dct_tables
000008c8 t svq3_decode_frame
-00002392 t svq3_decode_mb
+00002372 t svq3_decode_mb
000004cd t svq3_decode_slice_header
00000034 D svq3_decoder
00000080 r svq3_dequant_coeff
More information about the ffmpeg-devel
mailing list