[FFmpeg-devel] [PATCH] swscale: round on planar2x C code
Måns Rullgård
mans
Sun Sep 12 14:07:11 CEST 2010
Michael Niedermayer <michaelni at gmx.at> writes:
> On Sun, Sep 12, 2010 at 12:11:15PM +0100, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>>
>> > On Sun, Sep 12, 2010 at 10:30:18AM +0100, M?ns Rullg?rd wrote:
>> >> Ramiro Polla <ramiro.polla at gmail.com> writes:
>> >>
>> >> > Hi,
>> >> >
>> >> > The C and MMX2/3dnow code differ in planar2x due to pavgb's rounding.
>> >> > Attached patch makes the output similar.
>> >> >
>> >> > I couldn't measure any speed difference. gcc ends up using "leal
>> >> > 3(xxx)" instead of "leal (xxx)" which doesn't seem to have a speed
>> >> > penalty.
>> >>
>> >> What it does on x86 is irrelevant since the mmx code will always be
>> >> used in practice.
>> >
>> > no, the C code is used for the right border on mmx chips when the
>> > width is not a multiple of 8 and the C code
>> > used for that must round the same way as mmx.
>>
>> That C code doesn't have to be the same as the reference.
>
> no of course not but its simpler if it is
>
>>
>> >> > Otherwise we could put the MMX2/3dnow code under some if(flags &bitexact).
>> >> >
>> >> > Ramiro Polla
>> >> >
>> >> > Index: rgb2rgb_template.c
>> >> > ===================================================================
>> >> > --- rgb2rgb_template.c (revision 32166)
>> >> > +++ rgb2rgb_template.c (working copy)
>> >> > @@ -1820,10 +1820,10 @@ static inline void RENAME(planar2x)(const uint8_t
>> >> > dst[dstStride]= ( src[0] + 3*src[srcStride])>>2;
>> >> >
>> >> > for (x=mmxSize-1; x<srcWidth-1; x++) {
>> >> > - dst[2*x +1]= (3*src[x+0] + src[x+srcStride+1])>>2;
>> >> > - dst[2*x+dstStride+2]= ( src[x+0] + 3*src[x+srcStride+1])>>2;
>> >> > - dst[2*x+dstStride+1]= ( src[x+1] + 3*src[x+srcStride ])>>2;
>> >> > - dst[2*x +2]= (3*src[x+1] + src[x+srcStride ])>>2;
>> >> > + dst[2*x +1]= ((3*src[x+0] + src[x+srcStride+1])+3)>>2;
>> >> > + dst[2*x+dstStride+2]= (( src[x+0] + 3*src[x+srcStride+1])+3)>>2;
>> >> > + dst[2*x+dstStride+1]= (( src[x+1] + 3*src[x+srcStride ])+3)>>2;
>> >> > + dst[2*x +2]= ((3*src[x+1] + src[x+srcStride ])+3)>>2;
>> >>
>> >> WTF +3? Does mmx round like that? Most other CPUs with a rounding
>> >> average instruction do +4.
>>
>> I obviously meant +2.
>>
>> More generally, they do a rounding shift as (x + (1 << s-1)) >> s.
>
> that wont help us to implement this
> we need 3*a + b + whatever >>2 and we need it for 8bit unsigned values
> once we have to convert to 16bit for rounding shift instructions we already
> have lost 50% speed
>
> with mmx we use 2 a+b+1>>1 instructions
OK, makes sense.
>> >> IMO the C version should be easily implemented exactly on the
>> >> majority of systems, not bow to the quirks of intel.
>> >
>> > The most widespread cpu architecture our users use cannot be
>> > ignored even if there was a problem but iam not seeing one atm,
>> > other SIMD systems are quite likely to round like mmx
>>
>> I'm telling you they do not.
>
> then they dont support even mpeg MC which needs a+b+1>>1
> and i dont belive that
I got the impression mmx rounding shift added (1<<s)-1 rather than the
more common 1<<(s-1). Apparently this isn't the case after all.
Forget this discussion.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list