[FFmpeg-devel] [PATCH] RV40 Loop Filter (again)

Sat Nov 15 14:44:26 CET 2008

On Sat, Nov 15, 2008 at 01:24:32PM +0100, Michael Niedermayer wrote:
> On Sat, Nov 15, 2008 at 12:49:47PM +0200, Kostya wrote:
> > On Sat, Nov 15, 2008 at 11:10:26AM +0100, Michael Niedermayer wrote:
> > > On Sat, Nov 15, 2008 at 11:27:41AM +0200, Kostya wrote:
> > > > On Fri, Nov 14, 2008 at 06:23:30PM +0100, Michael Niedermayer wrote:
> > > > > On Fri, Nov 14, 2008 at 09:14:46AM +0200, Kostya wrote:
> > > > > > On Wed, Nov 12, 2008 at 07:46:32PM +0100, Michael Niedermayer wrote:
> > > > > > > On Wed, Nov 12, 2008 at 09:05:11AM +0200, Kostya wrote:
> > [...]
> > > > > > +            p0 = RV40_STRONG_FILTER(src, step, -1, rv40_dither_l[dmode + i]);
> > > > > > +            p1 = RV40_STRONG_FILTER(src, step,  0, rv40_dither_r[dmode + i]);
> > > > > > +            diff[0] = src[-1*step];
> > > > > > +            diff[1] = src[ 0*step];
> > > > > > +            src[-1*step] = sflag ? av_clip(p0, src[-1*step] - lims, src[-1*step] + lims) : p0;
> > > > > > +            src[ 0*step] = sflag ? av_clip(p1, src[ 0*step] - lims, src[ 0*step] + lims) : p1;
> > > > > > +            diff[0] -= src[-1*step];
> > > > > > +            diff[1] -= src[ 0*step];
> > > > > > +            p0 = RV40_STRONG_FILTER(src, step, -2, rv40_dither_l[dmode + i] + diff[1]*25);
> > > > > > +            p1 = RV40_STRONG_FILTER(src, step,  1, rv40_dither_r[dmode + i] + diff[0]*25);
> > > > > 
> > > > > I have my doubts about the order of operations being correct
> > > > > are you sure its not
> > > > > a = RV40_STRONG_FILTER(src, step, -2, rv40_dither_l[dmode + i]);
> > > > > b = RV40_STRONG_FILTER(src, step, -1, rv40_dither_l[dmode + i]);
> > > > > c = RV40_STRONG_FILTER(src, step,  0, rv40_dither_r[dmode + i]);
> > > > > d = RV40_STRONG_FILTER(src, step,  1, rv40_dither_r[dmode + i]);
> > > > > ...
> > > > > ?
> > > > > 
> > > > > also the diff[] just compensates for the value being overwritten due to the
> > > > > order of operations.
> > > > 
> > > > I've verified that against binary decoder.
> > > > As you can see, all filters are interlocked since they all use src[-2], src[-1], src[0]
> > > > and src[1] values but only one of them is used with the old value (the farthest one in
> > > > filtering operation).
> > > > I don't see any easy way to untangle them.
> > > 
> > > p0= (26*(s[-1] + s[ 0] + s[1]) + 25*(s[-2] + s[2]) + offl)>>7;
> > > q0= (26*(s[ 0] + s[ 1] + s[2]) + 25*(s[-1] + s[3]) + offr)>>7;
> > > p0= clipwhatever(p0)
> > > q0= clipwhatever(q0)
> > > p1= (26*(s[-2] + s[-1] + p0  ) + 25*(s[-3] + s[1]) + offl)>>7;
> > > q1= (26*(q0    + s[ 2] + s[3]) + 25*(s[ 0] + s[4]) + offr)>>7;
> >  
> > Exactly. But it's easier to me to include difference into bias then change filter. 
> 
> IMHO above is clearer than your code, besides being faster i suspect
> 
> 
> >  
> > > > > [...]
> > > > > > +            /* This pattern contains bits signalling that horizontal edges of
> > > > > > +             * the current block can be filtered.
> > > > > > +             * That happens when either of adjacent subblocks is coded or lies on
> > > > > > +             * the edge of 8x8 blocks with motion vectors differing by more than
> > > > > > +             * 3/4 pel in any component.
> > > > > > +             */
> > > > > > +            y_h_deblock =   cbp[POS_CUR]
> > > > > > +                        | ((cbp[POS_BOTTOM]     & MASK_Y_TOP_ROW)  << 16)
> > > > > > +                        | ((cbp[POS_CUR]                           <<  4) & ~MASK_Y_TOP_ROW)
> > > > > > +                        | ((cbp[POS_TOP]        & MASK_Y_LAST_ROW) >> 12)
> > > > > > +                        |   mvmasks[POS_CUR]
> > > > > > +                        | ((mvmasks[POS_BOTTOM] & MASK_Y_TOP_ROW)  << 16);
> > > > > > +            /* This pattern contains bits signalling that vertical edges of
> > > > > > +             * the current block can be filtered.
> > > > > > +             * That happens when either of adjacent subblocks is coded or lies on
> > > > > > +             * the edge of 8x8 blocks with motion vectors differing by more than
> > > > > > +             * 3/4 pel in any component.
> > > > > > +             */
> > > > > > +            y_v_deblock =   cbp[POS_CUR]
> > > > > > +                        | ((cbp[POS_CUR]                      << 1) & ~MASK_Y_LEFT_COL)
> > > > > > +                        | ((cbp[POS_LEFT] & MASK_Y_RIGHT_COL) >> 3)
> > > > > > +                        |   mvmasks[POS_CUR];
> > > > > 
> > > > > the text and mvmasks do not match
> > > > 
> > > > why? RV40 uses combined mvmask for both horizontal and vertical MV.
> > > 
> > > The text does not speak of combining anything, and the combination must be
> > > documented very exactly IMHO.
> > 
> > It says about edge of blocks with differing MVs, nothing is said about the direction,
> > so it's any edge. 
> 
> it clearly says:
>     That happens when either of adjacent subblocks is coded or lies on
>     the edge of 8x8 blocks with motion vectors differing by more than
>     3/4 pel in any component.
> 
> This clearly means the edge that is filtered, not some unrelated edge.

I'll make that "any edge, vertical or horizontal" then. With this codec
logic starts to abandon me.

> The problem iam having with this is that the current code
> 1. Makes no sense

so it suits the codec well :S

> 2. the decoder is not bitexact
> -> its hard for me to be sure the code is correct or even does what you
>    think it does ...

We have I-frames where my decoder is bitexact (before loop filter for sure
and after loop filter too), P-frames (there are bitexactness problems with them)
and B-frames (no one cares about them much if picture is more or less correct).

> > 
> > > Also is the decoder bitexact already?
> > > If not, why not?
> > 
> > I planned to verify another cases.
> > For now I can name only one known source of inexactness: motion compensation for luma.
> > H.264 uses 
> > (src[0] + (1*src[-2] - 5*src[-1] + 20*src[0] + 20*src[1] - 5*src[2] + src[3] + 16)>>5 + 1) >> 1
> > 
> > RV40 uses
> > 
> > (1*src[-2] - 5*src[-1] + 52*src[0] + 20*src[1] - 5*src[2] + src[3] + 32) >> 6
> > 
> > For example,
> > 20 20 20 |20| 22 23 23
> > results in 21 for H.264 and 20 for RV40
> 
> What stops you from implementing the RV40 MC variant?

Laziness, I suppose.
Maybe I'll send a patch for that next week if Ivan won't finish his.

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB