[FFmpeg-devel] [PATCH] avcodec/dvenc: support encoding dvcprohd

Sat Nov 2 21:08:46 EET 2019

On Thu, Sep 19, 2019 at 12:34 PM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> On Wed, Sep 11, 2019 at 12:29:57PM -0700, Baptiste Coudurier wrote:
> > ---
> >  libavcodec/dv.h    |   1 +
> >  libavcodec/dvenc.c | 576 ++++++++++++++++++++++++++++++++++++++++-----
> >  2 files changed, 522 insertions(+), 55 deletions(-)
>
> a fate test should be added for this if its not already planed or done
>

I'm having issues with fate on macOS catalina right now :(

> [...]
>

> > +    /* LOOP1: weigh AC components and store to save[] */
> > +    /* (i=0 is the DC component; we only include it to make the
> > +       number of loop iterations even, for future possible SIMD
> optimization) */
> > +    for (i = 0; i < 64; i += 2) {
> > +        int level0, level1;
> > +
> > +        /* get the AC component (in zig-zag order) */
> > +        level0 = blk[zigzag_scan[i+0]];
> > +        level1 = blk[zigzag_scan[i+1]];
> > +
> > +        /* extract sign and make it the lowest bit */
> > +        bi->sign[i+0] = (level0>>31)&1;
> > +        bi->sign[i+1] = (level1>>31)&1;
> > +
> > +        /* take absolute value of the level */
> > +        level0 = FFABS(level0);
> > +        level1 = FFABS(level1);
> > +
> > +        /* weigh it */
> > +        level0 = (level0*weight[i+0] + 4096 + (1<<17)) >> 18;
> > +        level1 = (level1*weight[i+1] + 4096 + (1<<17)) >> 18;
> > +
> > +        /* save unquantized value */
> > +        bi->save[i+0] = level0;
> > +        bi->save[i+1] = level1;
> > +    }
> > +
> > +    /* find max component */
> > +    for (i = 0; i < 64; i++) {
> > +        int ac = bi->save[i];
> > +        if (ac > max)
> > +            max = ac;
> > +    }
>
> these 2 loops can be merged avoiding a 2nd pass
>

Merged

[...]
> > +static inline void dv_guess_qnos_hd(EncBlockInfo *blks, int *qnos)
> > +{
> > +    EncBlockInfo *b;
> > +    int min_qlevel[5];
> > +    int qlevels[5];
> > +    int size[5];
> > +    int i, j;
> > +    /* cache block sizes at hypothetical qlevels */
> > +    uint16_t size_cache[5*8][DV100_NUM_QLEVELS] = {{0}};
> > +
> > +    /* get minimum qlevels */
> > +    for (i = 0; i < 5; i++) {
> > +        min_qlevel[i] = 1;
> > +        for (j = 0; j < 8; j++) {
> > +            if (blks[8*i+j].min_qlevel > min_qlevel[i])
> > +                min_qlevel[i] = blks[8*i+j].min_qlevel;
> > +        }
> > +    }
> > +
> > +    /* initialize sizes */
> > +    for (i = 0; i < 5; i++) {
> > +        qlevels[i] = dv100_starting_qno;
> > +        if (qlevels[i] < min_qlevel[i])
> > +            qlevels[i] = min_qlevel[i];
> > +
> > +        qnos[i] = DV100_QLEVEL_QNO(dv100_qlevels[qlevels[i]]);
> > +        size[i] = 0;
> > +        for (j = 0; j < 8; j++) {
> > +            size_cache[8*i+j][qlevels[i]] =
> dv100_actual_quantize(&blks[8*i+j], qlevels[i]);
> > +            size[i] += size_cache[8*i+j][qlevels[i]];
> > +        }
> > +    }
> > +
> > +    /* must we go coarser? */
> > +    if (size[0]+size[1]+size[2]+size[3]+size[4] > vs_total_ac_bits_hd) {
> > +        int largest = size[0] % 5; /* 'random' number */
> > +
>
> > +        do {
> > +            /* find the macroblock with the lowest qlevel */
> > +            for (i = 0; i < 5; i++) {
> > +                if (qlevels[i] < DV100_NUM_QLEVELS-1 &&
> > +                    qlevels[i] < qlevels[largest])
> > +                    largest = i;
> > +            }
> > +
> > +            i = largest;
> > +            /* ensure that we don't enter infinite loop */
> > +            largest = (largest+1) % 5;
> > +
> > +            if (qlevels[i] >= DV100_NUM_QLEVELS-1) {
> > +                /* can't quantize any more */
> > +                continue;
> > +            }
> > +
> > +            /* quantize a little bit more */
> > +            qlevels[i] += dv100_qlevel_inc;
> > +            if (qlevels[i] > DV100_NUM_QLEVELS-1)
> > +                qlevels[i] = DV100_NUM_QLEVELS-1;
> > +
> > +            qnos[i] = DV100_QLEVEL_QNO(dv100_qlevels[qlevels[i]]);
> > +            size[i] = 0;
> > +
> > +            /* for each block */
> > +            b = &blks[8*i];
> > +            for (j = 0; j < 8; j++, b++) {
> > +                /* accumulate block size into macroblock */
> > +                if(size_cache[8*i+j][qlevels[i]] == 0) {
> > +                    /* it is safe to use actual_quantize() here because
> we only go from finer to coarser,
> > +                       and it saves the final actual_quantize() down
> below */
> > +                    size_cache[8*i+j][qlevels[i]] =
> dv100_actual_quantize(b, qlevels[i]);
> > +                }
> > +                size[i] += size_cache[8*i+j][qlevels[i]];
> > +            } /* for each block */
> > +
> > +        } while (vs_total_ac_bits_hd < size[0] + size[1] + size[2] +
> size[3] + size[4] &&
> > +                 (qlevels[0] < DV100_NUM_QLEVELS-1 ||
> > +                  qlevels[1] < DV100_NUM_QLEVELS-1 ||
> > +                  qlevels[2] < DV100_NUM_QLEVELS-1 ||
> > +                  qlevels[3] < DV100_NUM_QLEVELS-1 ||
> > +                  qlevels[4] < DV100_NUM_QLEVELS-1));
>
> i think the DV100_NUM_QLEVELS checks can be simplified
>
> If we keep track of how many qlevels are < DV100_NUM_QLEVELS-1
> The check in the first loop is then not needed because if
> there is one that is smaller than that will be found and
> no need to check each against DV100_NUM_QLEVELS-1
>
> The smallest then being checked again against DV100_NUM_QLEVELS-1 also
> becomes unneeded
>
> and at the end the 5 checks in the while() can then be changed to a
> single check on the new variable
>
> This should make the code both faster and simpler
>

Updated, please check :)

Patch updated

-- 
Baptiste