[FFmpeg-devel] [PATCH] RV40 loop filter, try 2

Sun Nov 2 13:54:56 CET 2008

On Fri, Oct 31, 2008 at 10:00:47AM +0200, Kostya wrote:
> On Fri, Oct 31, 2008 at 01:18:12AM +0100, Michael Niedermayer wrote:
> > On Thu, Oct 30, 2008 at 04:51:19PM +0200, Kostya wrote:
> > > $subj
> > > 
> > > I've tried to generalize, clean and document it.
> > 
> > its better but this still needs a lot more work to be readable.
> > the bit masks still are practically unreadable.
> > Its not so much a question of what is added into them but what the
> > variables actually are.
> > what i mean is things like
> > 
> > cbp // these are 16bits representing the coded block flags of the 4x4 luma
> >        blocks like:
> >     0  1  2  3
> >     4  5  6  7
> >     8  9 10 11
> >    12 13 14 15 where 0 is lsb and 15 is msb
> >    (msb)3x3, 2x3, 1x3, 0x3, 3x2, ...
> > 
> > this is especially absolutely essential for all the trickier loop filter
> > masks. mvmask, these h/v masks, ...
> 
> documented

IMHO the documentation is still extreemly poor

The documentation on its own must be able to awnser _which_ block of which
macroblock is how related to lets say bit 0x0008.

With such documentation it would be easier to understan dthe code and
find a better way to implement it

>  
> > I approximately know by now what the variable contain but i do not know it
> > with enough precission and detail to really understand the code and
> > review it properly. Now i surely could "reverse engeneer" this but i
> > really think that this should not be needed, the code should be easier
> > to understand, it shouldnt require every reader to analyze the code to find
> > out what everything is ...
> 
> I should have written in in CWEB instead... 
>  
> > > Index: libavcodec/rv40.c
> > > ===================================================================
> > > --- libavcodec/rv40.c	(revision 15732)
> > > +++ libavcodec/rv40.c	(working copy)
> > > @@ -247,7 +247,427 @@
> > >      return 0;
> > >  }
> > >  
> > > +#define CLIP_SYMM(a, b) av_clip(a, -(b), b)
> > >  /**
> > > + * weaker deblocking very similar to the one described in 4.4.2 of JVT-A003r1
> > > + */
> > > +static inline void rv40_weak_loop_filter(uint8_t *src, const int step,
> > > +                                         const int filter_outer, const int filter_inner,
> > > +                                         const int alpha,
> > > +                                         const int lim_inner, const int lim_outer,
> > > +                                         const int difflim, const int beta,
> > > +                                         const int S0, const int S1,
> > > +                                         const int S2, const int S3)
> > > +{
> > > +    uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
> > > +    int t, u, diff;
> > > +
> > > +    t = src[0*step] - src[-1*step];
> > > +    if(!t){
> > > +        return;
> > > +    }
> > > +    u = (alpha * FFABS(t)) >> 7;
> > > +    if(u > 3 - (filter_inner && filter_outer)){
> > > +        return;
> > > +    }
> > > +
> > > +    t <<= 2;
> > > +    if(filter_inner && filter_outer)
> > > +        t += src[-2*step] - src[1*step];
> > > +    diff = CLIP_SYMM((t + 4) >> 3, difflim);
> > > +    src[-1*step] = cm[src[-1*step] + diff];
> > > +    src[ 0*step] = cm[src[ 0*step] - diff];
> > > +    if(FFABS(S2) <= beta && filter_outer){
> > > +        t = (S0 + S2 - diff) >> 1;
> > > +        src[-2*step] = cm[src[-2*step] - CLIP_SYMM(t, lim_outer)];
> > > +    }
> > > +    if(FFABS(S3) <= beta && filter_inner){
> > > +        t = (S1 + S3 + diff) >> 1;
> > > +        src[ 1*step] = cm[src[ 1*step] - CLIP_SYMM(t, lim_inner)];
> > > +    }
> > > +}
> > 
> > inner and outer?
> > src[0*step]/src[-1*step] would be inner and src[-2*step]/src[ 1*step]
> > outer relative to the edge, so iam confused by the naming ...
> 
> Pixels lying on some filtered edge, pixels adjacent to it and outer ones. 

yes thats how i interpreted the names but this is not what the variables
contain. Thats why i said its confusing, i guess saying it nonsese would
be better.

[...]
> > [...]
> > > +static int rv40_set_deblock_coef(RV34DecContext *r)
> > > +{
> > > +    MpegEncContext *s = &r->s;
> > > +    int mvmask = 0, i, j;
> > > +    int midx = s->mb_x * 2 + s->mb_y * 2 * s->b8_stride;
> > > +    int16_t (*motion_val)[2] = s->current_picture_ptr->motion_val[0][midx];
> > > +    if(s->pict_type == FF_I_TYPE)
> > > +        return 0;
> > > +    for(i = 0; i < 2; i++){
> > > +        if(!s->first_slice_line && check_mv_difference(motion_val + i, s->b8_stride)){
> > > +            mvmask |= 0x03 << (i*2);
> > > +        }
> > > +        if(check_mv_difference(motion_val + s->b8_stride + i, s->b8_stride)){
> > > +            mvmask |= 0x03 << (i*2 + 8);
> > > +        }
> > > +    }
> > > +    for(j = 0; j < 2; j++){
> > > +        if(s->mb_x && check_mv_difference(motion_val, 1)){
> > > +            mvmask |= 0x11 << (j*8);
> > > +        }
> > > +        if(check_mv_difference(motion_val + 1, 1)){
> > > +            mvmask |= 0x11 << (2 + j*8);
> > > +        }
> > > +        motion_val += s->b8_stride;
> > > +    }
> > > +    return mvmask;
> > > +}
> > 
> > the s->first_slice_line and s->mb_x checks are still inside the loops.
> 
> I prefer not to read outside the picture bounds.

why? The array is allocated large enough exactly to avoid such
checks in the inner loops.

[...]

> +/**
> + * This macro is used for calculating 25*x0+26*x1+26*x2+26*x3+25*x4
> + * or 25*x0+26*x1+51*x2+26*x3

spaces surrounding the + would make this more readable
also documentation of foo should explain _what foo does_ not so much for
what it is used. Useage could change and represents just a subset of the
functionality ...

[...]
> +}
> +
> +static int check_mv_difference(int16_t (*motion_val)[2], int step)

the name is still poor
is_mv_diff_gt_3 would be shotrter and clearer

> +{
> +    int d;
> +    d = motion_val[0][0] - motion_val[-step][0];
> +    if(d < -3 || d > 3)
> +        return 1;
> +    d = motion_val[0][1] - motion_val[-step][1];
> +    if(d < -3 || d > 3)
> +        return 1;
> +    return 0;
> +}
> +
> +static int rv40_set_deblock_coef(RV34DecContext *r)
> +{
> +    MpegEncContext *s = &r->s;
> +    int mvmask = 0, i, j;
> +    int midx = s->mb_x * 2 + s->mb_y * 2 * s->b8_stride;
> +    int16_t (*motion_val)[2] = s->current_picture_ptr->motion_val[0][midx];
> +    if(s->pict_type == FF_I_TYPE)
> +        return 0;
> +    for(i = 0; i < 2; i++){
> +        if(!s->first_slice_line && check_mv_difference(motion_val + i, s->b8_stride)){
> +            mvmask |= 0x03 << (i*2);
> +        }
> +        if(check_mv_difference(motion_val + s->b8_stride + i, s->b8_stride)){
> +            mvmask |= 0x300 << (i*2);
> +        }
> +    }
> +    for(j = 0; j < 16; j += 8){
> +        if(s->mb_x && check_mv_difference(motion_val, 1)){
> +            mvmask |= 0x11 << j;
> +        }
> +        if(check_mv_difference(motion_val + 1, 1)){
> +            mvmask |= 0x44 << j;
> +        }
> +        motion_val += s->b8_stride;
> +    }
> +    return mvmask;
> +}

are you sure this code is correct? More specifically are you sure that
mv differences across vertical edges can cause filtering accros horizontal
edges to be enabled?
It really looks like 2 different things are stored in the same bits.

[...]
> +
> +/**
> + * RV40 loop filtering function
> + */
> +static void rv40_loop_filter(RV34DecContext *r)
> +{
> +    MpegEncContext *s = &r->s;
> +    int mb_pos;
> +    int i, j, k;
> +    uint8_t *Y, *C;
> +    int alpha, beta, betaY, betaC;
> +    int q;
> +    int mbtype[4];   ///< current macroblock and its neighbours types
> +    /**
> +     * macroblock filtering strength
> +     * 2 for intra coded MB and MB with DCs coded separately, 1 otherwise
> +     */
> +    int strength[4];
> +    int clip[4];     ///< MB filter clipping value calculated from filtering strength
> +    /**
> +     * coded block patterns for luma part of current macroblock and its neighbours
> +     * Format:
> +     * LSB corresponds to the top left block,
> +     * each nibble represents one row of subblocks.
> +     */
> +    int cbp[4];

> +    /**
> +     * coded block patterns for chroma part of current macroblock and its neighbours
> +     * Format is the same as for luma with two subblocks in a row.
> +     */
> +    int uvcbp[4][2];

> +    /**
> +     * This mask represents the pattern of luma subblocks that should be filtered

> +     * in addition to the coded ones because because they lie at the edge of
> +     * 8x8 block with different enough motion vectors
> +     * (see rv40_set_deblock_coef() for the details)
> +     */
> +    int mvmasks[4];

i think this variable should be documented in the doxy of
rv40_set_deblock_coef() which returns it.

> +
> +    if(s->pict_type == FF_B_TYPE)
> +        return;
> +
> +    for(s->mb_y = 0; s->mb_y < s->mb_height; s->mb_y++){
> +        mb_pos = s->mb_y * s->mb_stride;
> +        for(s->mb_x = 0; s->mb_x < s->mb_width; s->mb_x++, mb_pos++){
> +            int btype = s->current_picture_ptr->mb_type[mb_pos];
> +            if(IS_INTRA(btype) || IS_SEPARATE_DC(btype)){
> +                r->cbp_luma  [mb_pos] = 0xFFFF;
> +            }
> +            if(IS_INTRA(btype)){
> +                r->cbp_chroma[mb_pos] = 0xFF;
> +            }
> +        }
> +    }
> +    for(s->mb_y = 0; s->mb_y < s->mb_height; s->mb_y++){
> +        mb_pos = s->mb_y * s->mb_stride;
> +        for(s->mb_x = 0; s->mb_x < s->mb_width; s->mb_x++, mb_pos++){
> +            int y_h_deblock, y_v_deblock;
> +            int c_v_deblock[2], c_h_deblock[2];
> +
> +            ff_init_block_index(s);
> +            ff_update_block_index(s);
> +            Y = s->dest[0];
> +            q = s->current_picture_ptr->qscale_table[mb_pos];
> +            alpha = rv40_alpha_tab[q];
> +            beta  = rv40_beta_tab [q];
> +            betaY = betaC = beta * 3;
> +            if(s->width * s->height <= 176*144){
> +                betaY += beta;
> +            }
> +
> +            mvmasks[POS_CUR] = r->deblock_coefs[mb_pos];
> +            mbtype [POS_CUR] = s->current_picture_ptr->mb_type[mb_pos];
> +            cbp    [POS_CUR] = r->cbp_luma[mb_pos];
> +            uvcbp[POS_CUR][0] = r->cbp_chroma[mb_pos] & 0xF;
> +            uvcbp[POS_CUR][1] = r->cbp_chroma[mb_pos] >> 4;
> +            for(i = 1; i < 4; i++){
> +                mvmasks[i] = 0;
> +                mbtype [i] = mbtype[0];
> +                cbp    [i] = 0;
> +                uvcbp[1][0] = uvcbp[1][1] = 0;
> +            }
> +            if(s->mb_y){
> +                mvmasks[POS_TOP] = r->deblock_coefs[mb_pos - s->mb_stride] & MASK_Y_LAST_ROW;
> +                mbtype [POS_TOP] = s->current_picture_ptr->mb_type[mb_pos - s->mb_stride];
> +                cbp    [POS_TOP] = r->cbp_luma[mb_pos - s->mb_stride] & MASK_Y_LAST_ROW;
> +                uvcbp[POS_TOP][0] =  r->cbp_chroma[mb_pos - s->mb_stride]       & MASK_C_LAST_ROW;
> +                uvcbp[POS_TOP][1] = (r->cbp_chroma[mb_pos - s->mb_stride] >> 4) & MASK_C_LAST_ROW;
> +            }
> +            if(s->mb_x){
> +                mvmasks[POS_LEFT] = r->deblock_coefs[mb_pos - 1] & MASK_Y_RIGHT_COL;
> +                mbtype [POS_LEFT] = s->current_picture_ptr->mb_type[mb_pos - 1];
> +                cbp    [POS_LEFT] = r->cbp_luma[mb_pos - 1] & MASK_Y_RIGHT_COL;
> +                uvcbp[POS_LEFT][0] =  r->cbp_chroma[mb_pos - 1]       & MASK_C_RIGHT_COL;
> +                uvcbp[POS_LEFT][1] = (r->cbp_chroma[mb_pos - 1] >> 4) & MASK_C_RIGHT_COL;
> +            }
> +            if(s->mb_y < s->mb_height - 1){
> +                mvmasks[POS_BOTTOM] = r->deblock_coefs[mb_pos + s->mb_stride] & MASK_Y_TOP_ROW;
> +                mbtype [POS_BOTTOM] = s->current_picture_ptr->mb_type[mb_pos + s->mb_stride];
> +                cbp    [POS_BOTTOM] = r->cbp_luma[mb_pos + s->mb_stride] & MASK_Y_TOP_ROW;
> +                uvcbp[POS_BOTTOM][0] =  r->cbp_chroma[mb_pos + s->mb_stride]       & MASK_C_TOP_ROW;
> +                uvcbp[POS_BOTTOM][1] = (r->cbp_chroma[mb_pos + s->mb_stride] >> 4) & MASK_C_TOP_ROW;
> +            }

what is this code doing? it seems to just mask unused bits out

> +            for(i = 0; i < 4; i++){
> +                strength[i] = (IS_INTRA(mbtype[i]) || IS_SEPARATE_DC(mbtype[i])) ? 2 : 1;
> +                clip[i] = rv40_filter_clip_tbl[strength[i]][q];
> +            }

> +            /* This pattern contains bits signalling that horizontal edges of
> +             * the current block can be filtered.
> +             * It is composed from the coded block pattern for the current MB,
> +             * coded block pattern for the bottom MB, superposition of the current
> +             * and the top macroblock CBP shifted down by one row and an additional
> +             * mask derived from motion vectors.
> +             */
> +            y_h_deblock = cbp[POS_CUR]
> +                        | ((cbp[POS_CUR] << 4) & ~MASK_Y_TOP_ROW) | (cbp[POS_TOP] >> 12)
> +                        | ((cbp[POS_BOTTOM] << 20) & ~MASK_Y_TOP_ROW) | (cbp[POS_BOTTOM] << 16)
> +                        | mvmasks[POS_CUR] | (mvmasks[POS_CUR] << 16);

what is this code supposed to do?
this does (x<<4) & ~0xF and (x<<20) & ~0xF 
these operations obviously make no sense at all

also mvmasks[POS_CUR] | (mvmasks[POS_CUR] << 16) makes no sense in
combination with the other code

> +            /* This pattern contains bits signalling that left edge of
> +             * the current block can be filtered.
> +             * It is composed from the coded block pattern for the current MB,
> +             * superposition of the current and the left macroblock CBP shifted
> +             * right by one column and an additional mask derived from motion vectors.
> +             */

> +            y_v_deblock = cbp[POS_CUR]
> +                        | ((cbp[POS_CUR] << 1) & ~MASK_Y_LEFT_COL) | (cbp[POS_LEFT] >> 3)
> +                        | mvmasks[0];

0 supposed to be POS_CUR ?

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081102/c3aaa8ce/attachment.pgp>