[Ffmpeg-devel] [PATCH] H.264 deblocking mmx
Michael Niedermayer
michaelni
Mon Apr 25 01:32:21 CEST 2005
Hi
On Monday 25 April 2005 00:39, Loren Merritt wrote:
> I noticed that the inloop deblocking filter was taking a large fraction of
> the decode time, and it is inherently parallel, so...
>
> Benchmarks on my Athlon-XP:
> C:
> 4182 dezicycles in filter_mb_edgecv, 4193308 runs, 996 skips
> 4004 dezicycles in filter_mb_edgech, 4193305 runs, 999 skips
> 9930 dezicycles in filter_mb_edgev, 4191771 runs, 2533 skips
> 11200 dezicycles in filter_mb_edgeh, 4191510 runs, 2794 skips
>
> MMX:
> 2197 dezicycles in filter_mb_edgecv, 4193544 runs, 760 skips
> 1714 dezicycles in filter_mb_edgech, 4193733 runs, 571 skips
> 4928 dezicycles in filter_mb_edgev, 4192872 runs, 1432 skips
> 3977 dezicycles in filter_mb_edgeh, 4193087 runs, 1217 skips
>
> total: +17% decode speed
>
> ... however, I have reports that this patch crashes on some systems and
> doesn't even compile on amd64. So I'm offering it for anyone who wants to
> figure out what's broken.
[...]
>+ :: "r"(pix-3*stride), "r"(pix), "r"(stride),
>+ "r"(tc0), "r"(alpha), "r"(beta), "m"(ff_pw_4),
>+ "m"(tmp0), "m"(tmp1)
tmp0/tmp1 are writen here as input operands but stuff is written into them,
stride also needs to be 64bit on amd64 this should be
: "+m"(tmp0), "+m"(tmp1)
: "r"(pix-3*stride), "r"(pix), "r"((long)stride),
"r"(tc0), "r"(alpha), "r"(beta), "m"(ff_pw_4),
btw, commit it, if it works on your computer, we will fix amd64 and any other
issues as people provide bugreports ...
[...]
--
Michael
"nothing is evil in the beginning. Even Sauron was not so." -- Elrond
More information about the ffmpeg-devel
mailing list