[Ffmpeg-devel] [PATCH] H.264 deblocking mmx
    Michael Niedermayer 
    michaelni
       
    Mon Apr 25 01:32:21 CEST 2005
    
    
  
Hi
On Monday 25 April 2005 00:39, Loren Merritt wrote:
> I noticed that the inloop deblocking filter was taking a large fraction of
> the decode time, and it is inherently parallel, so...
>
> Benchmarks on my Athlon-XP:
> C:
> 4182 dezicycles in filter_mb_edgecv, 4193308 runs, 996 skips
> 4004 dezicycles in filter_mb_edgech, 4193305 runs, 999 skips
> 9930 dezicycles in filter_mb_edgev, 4191771 runs, 2533 skips
> 11200 dezicycles in filter_mb_edgeh, 4191510 runs, 2794 skips
>
> MMX:
> 2197 dezicycles in filter_mb_edgecv, 4193544 runs, 760 skips
> 1714 dezicycles in filter_mb_edgech, 4193733 runs, 571 skips
> 4928 dezicycles in filter_mb_edgev, 4192872 runs, 1432 skips
> 3977 dezicycles in filter_mb_edgeh, 4193087 runs, 1217 skips
>
> total: +17% decode speed
>
> ... however, I have reports that this patch crashes on some systems and
> doesn't even compile on amd64. So I'm offering it for anyone who wants to
> figure out what's broken.
[...]
>+        :: "r"(pix-3*stride), "r"(pix), "r"(stride),
>+           "r"(tc0), "r"(alpha), "r"(beta), "m"(ff_pw_4),
>+           "m"(tmp0), "m"(tmp1)
tmp0/tmp1 are writen here as input operands but stuff is written into them, 
stride also needs to be 64bit on amd64 this should be 
        :  "+m"(tmp0), "+m"(tmp1)
        :  "r"(pix-3*stride), "r"(pix), "r"((long)stride),
           "r"(tc0), "r"(alpha), "r"(beta), "m"(ff_pw_4),
btw, commit it, if it works on your computer, we will fix amd64 and any other 
issues as people provide bugreports ...
[...]
-- 
Michael
"nothing is evil in the beginning. Even Sauron was not so." -- Elrond
    
    
More information about the ffmpeg-devel
mailing list