[FFmpeg-devel] [PATCH] swscale: round on planar2x C code

Sun Sep 12 13:11:15 CEST 2010

Michael Niedermayer <michaelni at gmx.at> writes:

> On Sun, Sep 12, 2010 at 10:30:18AM +0100, M?ns Rullg?rd wrote:
>> Ramiro Polla <ramiro.polla at gmail.com> writes:
>> 
>> > Hi,
>> >
>> > The C and MMX2/3dnow code differ in planar2x due to pavgb's rounding.
>> > Attached patch makes the output similar.
>> >
>> > I couldn't measure any speed difference. gcc ends up using "leal
>> > 3(xxx)" instead of "leal (xxx)" which doesn't seem to have a speed
>> > penalty.
>> 
>> What it does on x86 is irrelevant since the mmx code will always be
>> used in practice.
>
> no, the C code is used for the right border on mmx chips when the
> width is not a multiple of 8 and the C code
> used for that must round the same way as mmx.

That C code doesn't have to be the same as the reference.

>> > Otherwise we could put the MMX2/3dnow code under some if(flags &bitexact).
>> >
>> > Ramiro Polla
>> >
>> > Index: rgb2rgb_template.c
>> > ===================================================================
>> > --- rgb2rgb_template.c	(revision 32166)
>> > +++ rgb2rgb_template.c	(working copy)
>> > @@ -1820,10 +1820,10 @@ static inline void RENAME(planar2x)(const uint8_t
>> >          dst[dstStride]= (  src[0] + 3*src[srcStride])>>2;
>> >  
>> >          for (x=mmxSize-1; x<srcWidth-1; x++) {
>> > -            dst[2*x          +1]= (3*src[x+0] +   src[x+srcStride+1])>>2;
>> > -            dst[2*x+dstStride+2]= (  src[x+0] + 3*src[x+srcStride+1])>>2;
>> > -            dst[2*x+dstStride+1]= (  src[x+1] + 3*src[x+srcStride  ])>>2;
>> > -            dst[2*x          +2]= (3*src[x+1] +   src[x+srcStride  ])>>2;
>> > +            dst[2*x          +1]= ((3*src[x+0] +   src[x+srcStride+1])+3)>>2;
>> > +            dst[2*x+dstStride+2]= ((  src[x+0] + 3*src[x+srcStride+1])+3)>>2;
>> > +            dst[2*x+dstStride+1]= ((  src[x+1] + 3*src[x+srcStride  ])+3)>>2;
>> > +            dst[2*x          +2]= ((3*src[x+1] +   src[x+srcStride  ])+3)>>2;
>> 
>> WTF +3?  Does mmx round like that?  Most other CPUs with a rounding
>> average instruction do +4.

I obviously meant +2.

More generally, they do a rounding shift as (x + (1 << s-1)) >> s.

>> IMO the C version should be easily implemented exactly on the
>> majority of systems, not bow to the quirks of intel.
>
> The most widespread cpu architecture our users use cannot be ignored even if
> there was a problem but iam not seeing one atm, other SIMD systems are quite
> likely to round like mmx

I'm telling you they do not.

> so it makes sense to consider to change C to it too.  if you know of
> some cpu architecture that cant handle that way of rounding you
> should tell us

I just did: all except x86.

> so we can consider that in the decisson of what to do with the c
> code.

Which you are refusing to do.

-- 
M?ns Rullg?rd
mans at mansr.com