[FFmpeg-devel] [PATCH] swscale: round on planar2x C code

Sun Sep 12 11:30:18 CEST 2010

Ramiro Polla <ramiro.polla at gmail.com> writes:

> Hi,
>
> The C and MMX2/3dnow code differ in planar2x due to pavgb's rounding.
> Attached patch makes the output similar.
>
> I couldn't measure any speed difference. gcc ends up using "leal
> 3(xxx)" instead of "leal (xxx)" which doesn't seem to have a speed
> penalty.

What it does on x86 is irrelevant since the mmx code will always be
used in practice.

> Otherwise we could put the MMX2/3dnow code under some if(flags &bitexact).
>
> Ramiro Polla
>
> Index: rgb2rgb_template.c
> ===================================================================
> --- rgb2rgb_template.c	(revision 32166)
> +++ rgb2rgb_template.c	(working copy)
> @@ -1820,10 +1820,10 @@ static inline void RENAME(planar2x)(const uint8_t
>          dst[dstStride]= (  src[0] + 3*src[srcStride])>>2;
>  
>          for (x=mmxSize-1; x<srcWidth-1; x++) {
> -            dst[2*x          +1]= (3*src[x+0] +   src[x+srcStride+1])>>2;
> -            dst[2*x+dstStride+2]= (  src[x+0] + 3*src[x+srcStride+1])>>2;
> -            dst[2*x+dstStride+1]= (  src[x+1] + 3*src[x+srcStride  ])>>2;
> -            dst[2*x          +2]= (3*src[x+1] +   src[x+srcStride  ])>>2;
> +            dst[2*x          +1]= ((3*src[x+0] +   src[x+srcStride+1])+3)>>2;
> +            dst[2*x+dstStride+2]= ((  src[x+0] + 3*src[x+srcStride+1])+3)>>2;
> +            dst[2*x+dstStride+1]= ((  src[x+1] + 3*src[x+srcStride  ])+3)>>2;
> +            dst[2*x          +2]= ((3*src[x+1] +   src[x+srcStride  ])+3)>>2;

WTF +3?  Does mmx round like that?  Most other CPUs with a rounding
average instruction do +4.  IMO the C version should be easily
implemented exactly on the majority of systems, not bow to the quirks
of intel.

-- 
M?ns Rullg?rd
mans at mansr.com