[FFmpeg-devel] [PATCH] h264_cabac.c: branchless (amvd>2)+(amvd>32)
Zhou Zongyi
zhouzy
Fri Feb 26 16:28:32 CET 2010
Hi Michael,
in commit 22032:
>switch back to (amvd>2)+(amvd>32), its 5 cpu cycles faster now.
On x86 it seems gcc uses the following way to get (amvd>2)
xor reg, reg
cmp reg, 2
setg regb
This introduces partial register access, which is slow on most CPUs.
Here is my patch, saving one instruction and no partial register access.
Index: libavcodec/h264_cabac.c
===================================================================
--- libavcodec/h264_cabac.c (revision 22075)
+++ libavcodec/h264_cabac.c (working copy)
@@ -912,10 +912,12 @@
static int decode_cabac_mb_mvd( H264Context *h, int ctxbase, int amvd, int *mvda) {
int mvd;
- if(!get_cabac(&h->cabac, &h->cabac_state[ctxbase+(amvd>2)+(amvd>32)])){
+#define SHIFT (sizeof(int)*4-1)
+ if(!get_cabac(&h->cabac, &h->cabac_state[ctxbase+((amvd-3)>>SHIFT)+((amvd-33)>>SHIFT)+2])){
*mvda= 0;
return 0;
}
+#undef SHIFT
mvd= 1;
ctxbase+= 3;
Regards,
ZZ
More information about the ffmpeg-devel
mailing list