[Ffmpeg-devel] [PATCH] h264 optimization: common case hl_decode_mb
Michael Niedermayer
michaelni
Fri Feb 23 12:25:09 CET 2007
Hi
On Fri, Feb 23, 2007 at 02:08:26AM -0500, Alexander Strange wrote:
> I noticed that hl_decode_mb is near the top of profiling the h264
> decoder and is full of huge conditionals.
>
> This patch copies the function, with a new version that runs for the
> common case: no interlacing, grayscale decoding disabled, not
> encoding, and not decoding SVQ3.
>
> It has a very small, but significant speed gain on my test video,
> which is 1080p and 1.2MBit with I/P frames:
> BENCHMARKs: VC: 25.189s VO: 1.906s A: 0.000s Sys: 0.181s =
> 27.277s
> BENCHMARKs: VC: 25.188s VO: 1.889s A: 0.000s Sys: 0.180s =
> 27.257s
> BENCHMARKs: VC: 25.195s VO: 1.897s A: 0.000s Sys: 0.181s =
> 27.273s
> BENCHMARKs: VC: 25.192s VO: 1.898s A: 0.000s Sys: 0.182s =
> 27.271s
> avg 25.101 +/- .003162
>
> BENCHMARKs: VC: 24.926s VO: 1.903s A: 0.000s Sys: 0.182s =
> 27.010s
> BENCHMARKs: VC: 24.927s VO: 1.903s A: 0.000s Sys: 0.182s =
> 27.012s
> BENCHMARKs: VC: 24.926s VO: 1.900s A: 0.000s Sys: 0.182s =
> 27.008s
> BENCHMARKs: VC: 24.924s VO: 1.898s A: 0.000s Sys: 0.181s =
> 27.003s
> avg 24.9258 +/- .001258
nice :)
>
> This is a 2.16GHz Intel Core Duo, so I expect most other people will
> see a bigger change.
>
> hl_decode_mb_simple is 880 instructions vs. 2018 for the general one.
>
> _simple inlines backup_mb_border and xchg_mb_border, which still have
> checks for grayscale. For some reason when I removed them it actually
> got slower. I guess this is because it gives gcc's register allocator
> more live variables at once?
>
> Any comments on this are appreciated.
ok, first, tabs are forbidden in svn
second, could you try something like:
static always_inline hl_decode_mb_internal(H264Context *h, int complex){
...
if(complex){
interlacing and other complex code
}
...
if( ...
...
}
static hl_decode_mb_simple(H264Context *h){
hl_decode_mb_internal(h, 0);
}
static hl_decode_mb_complex(H264Context *h){
hl_decode_mb_internal(h, 1);
}
that prevents code duplication (which is definitly bad for the already pretty
large h264.c)
or even keeping a single hl_decode_mb() but spliting the mbaff out
into other av_noinline functions (though this might have a negative
impact on the mbaff speed?
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070223/171d1e7b/attachment.pgp>
More information about the ffmpeg-devel
mailing list