[FFmpeg-devel] Once again: Multithreaded H.264 decoding with ffmpeg?
Alexander Strange
astrange
Fri May 30 08:04:42 CEST 2008
On May 30, 2008, at 1:52 AM, Jason Garrett-Glaser wrote:
>>> I have been looking into the h264 code and each piece of H.264
>>> documentation I could get my hands on. And I have the impression
>>> that
>>> some of the decoding steps (namely residual decoding, deblocking)
>>> could
>>> be parallelized quite well. But I don't have any idea how much
>>> time the
>>> individual decoding steps take. Does someone happen to have some
>>> numbers? Or a hint how to measure this myself?
>
> [Profile courtesy of Loren Merritt]
>
> ffh264 svn-r11870 (2008-02-04)
> CPU: Core 2, speed 2400.75 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
> unit mask of 0x00 (Unhalted core cycles) count 100000
> samples % symbol name
> 168093 9.2010 decode_mb_cabac
> 165494 9.0587 decode_cabac_residual
> 133817 7.3248 fill_caches
> 115161 6.3036 hl_decode_mb_simple
> 111744 6.1166 h264_#_loop_filter_luma_mmx2
> 101511 5.5565 put_h264_chroma_mc8_mmx
> 88055 4.8199 h264_#_loop_filter_chroma_mmx2
> 72618 3.9749 filter_mb_fast
> 70919 3.8819 get_cabac_noinline
There's been some work since then - Loren wrote SSSE3 MC functions and
the rest might be a bit better. I'd guess fill_caches and the loop
filter are more important now; if you want to look at those, it would
be great, but make sure you're good at assembly first.
(I have some patches in my head that will improve
decode_cabac_residual, but you'd like me to do frame multithreading
first, right?)
More information about the ffmpeg-devel
mailing list