[MPlayer-dev-eng] [PATCH]: hqdn3d.c: refactorize LowPassMul inmacro, 10%~20% faster on Athlon X2
Loren Merritt
lorenm at u.washington.edu
Sat Jan 10 19:25:53 CET 2009
On Fri, 9 Jan 2009, Zhou Zongyi wrote:
> And any idea about SIMD optimization on this? I tried SSE2 on
> deNoiseTemporal but it runs even slower than original C code.
SIMD would have to compute the gamma curve every time while scalar can use
a LUT. That basically kills any such idea.
I get 18% speedup with the attached patch on conroe x86_64 gcc-4.2.3.
It just removes some sign-extends from the inner loop.
--Loren Merritt
-------------- next part --------------
Index: libmpcodecs/vf_hqdn3d.c
===================================================================
--- libmpcodecs/vf_hqdn3d.c (revision 28192)
+++ libmpcodecs/vf_hqdn3d.c (working copy)
@@ -70,7 +70,7 @@
static inline unsigned int LowPassMul(unsigned int PrevMul, unsigned int CurrMul, int* Coef){
// int dMul= (PrevMul&0xFFFFFF)-(CurrMul&0xFFFFFF);
int dMul= PrevMul-CurrMul;
- int d=((dMul+0x10007FF)>>12);
+ unsigned int d=((dMul+0x10007FF)>>12);
return CurrMul + Coef[d];
}
@@ -81,7 +81,7 @@
int W, int H, int sStride, int dStride,
int *Temporal)
{
- int X, Y;
+ long X, Y;
unsigned int PixelDst;
for (Y = 0; Y < H; Y++){
@@ -103,8 +103,8 @@
int W, int H, int sStride, int dStride,
int *Horizontal, int *Vertical)
{
- int X, Y;
- int sLineOffs = 0, dLineOffs = 0;
+ long X, Y;
+ long sLineOffs = 0, dLineOffs = 0;
unsigned int PixelAnt;
unsigned int PixelDst;
@@ -143,8 +143,8 @@
int W, int H, int sStride, int dStride,
int *Horizontal, int *Vertical, int *Temporal)
{
- int X, Y;
- int sLineOffs = 0, dLineOffs = 0;
+ long X, Y;
+ long sLineOffs = 0, dLineOffs = 0;
unsigned int PixelAnt;
unsigned int PixelDst;
unsigned short* FrameAnt=(*FrameAntPtr);
More information about the MPlayer-dev-eng
mailing list