[FFmpeg-devel] [PATCH] h264.c/decode_cabac_residual optimization
Alexander Strange
astrange
Tue Jul 1 19:14:56 CEST 2008
1 - Don't check for encoding in hl_decode_mb, since there's no encoder.
2 - Turn off cloning hl_decode_mb under CONFIG_SMALL.
3 - Clone decode_cabac_residual depending on cat. This gets rid of
several cat checks and the if (qmul) for every coeff.
On Core 2:
Before-
24205 dezicycles in decode_cabac_residual, 1048505 runs, 71 skips
23079 dezicycles in decode_cabac_residual, 2096992 runs, 160 skips
22186 dezicycles in decode_cabac_residual, 4193933 runs, 371 skips
22444 dezicycles in decode_cabac_residual, 8387862 runs, 746 skips
After-
24035 dezicycles in decode_cabac_residual, 1048502 runs, 74 skips
22922 dezicycles in decode_cabac_residual, 2096981 runs, 171 skips
22037 dezicycles in decode_cabac_residual, 4193955 runs, 349 skips
22293 dezicycles in decode_cabac_residual, 8387927 runs, 681 skips
cabac_residual's time is definitely not normally distributed, so I
wouldn't trust the cycle counts for anything, but at least it went down.
4 - reindention
5 - Simplify "for( coeff_count--; coeff_count >= 0; coeff_count-- )".
6 - Reorder the cat checks based on frequency, mostly from counting
block types in x264 output. gcc statically predicts if == to be not
taken, so the more common one should be in the else block. Moving cat
== 5 to the top of the if/elses also makes it more obvious that it can
be merged into the significance map.
7 - Use get_cabac_bypass_sign in both branches of the loop.
Before-
24043 dezicycles in decode_cabac_residual, 524159 runs, 129 skips
24462 dezicycles in decode_cabac_residual, 1048335 runs, 241 skips
23323 dezicycles in decode_cabac_residual, 2096708 runs, 444 skips
22416 dezicycles in decode_cabac_residual, 4193483 runs, 821 skips
After-
23318 dezicycles in decode_cabac_residual, 524112 runs, 176 skips
23712 dezicycles in decode_cabac_residual, 1048286 runs, 290 skips
22608 dezicycles in decode_cabac_residual, 2096607 runs, 545 skips
21731 dezicycles in decode_cabac_residual, 4193412 runs, 892 skips
There are some interesting things to look at after these-
- gcc pointlessly unrolls the "while( coeff_abs < 15 && get_cabac( CC,
ctx ) )" loop into taking up about half of the compiled function when
x86 asm is used, so it could be rewritten in asm to fix that.
- get_cabac_bypass* could use cmov, and refill() is so small it might
take less time than a branch mispredict penalty.
- The asm operands for get_cabac itself could be relaxed (not
requiring output to eax, using +m/+g, that kind of thing). I tried
this earlier and got bad results, but maybe it'll work better after
this.
- refill* should use AV_RB16.
- I'm not sure what CABAC_ON_STACK is used for.
But I don't want to spend too much time not doing SoC - someone else
can try those if they want.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1-noencoding.diff
Type: text/x-diff
Size: 775 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/5325b42e/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2-hldecodesmall.diff
Type: text/x-diff
Size: 972 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/5325b42e/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3-cabacresidual-specialize.diff
Type: text/x-diff
Size: 5450 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/5325b42e/attachment-0002.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4-reindent.diff
Type: text/x-diff
Size: 2952 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/5325b42e/attachment-0003.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 5-whileloop.diff
Type: text/x-diff
Size: 471 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/5325b42e/attachment-0004.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6-reorder-cat-if.diff
Type: text/x-diff
Size: 1974 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/5325b42e/attachment-0005.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 7-bypasssign.diff
Type: text/x-diff
Size: 823 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/5325b42e/attachment-0006.diff>
More information about the ffmpeg-devel
mailing list