[MPlayer-dev-eng] spp deblocking GREAT optimization !!!

Fri Sep 3 17:28:09 CEST 2004

Hello,

>> NP> You can get a 4-6x speedup of the SPP fiter by decomposing vert and
>> NP> horiz 1d dct/idct and decimating horizontal ones. This way, number of
>> NP> horiz passes will be 4 times for 4 & 5 levels, or 8 times for 6 level
>> NP> (which is rather useless, as noted in the original paper) lower.
>> NP> Vert passes are more suitable for optimization :)
>>
>> NP> Next, you can use implied non-flat treshold with AAN dct w/o scale.
>> NP> (BTW, it is an interesing question - which treshold matrix provides
>> NP> best psnr?)

> if u did implement it and it is faster while still giving the same result
> besides rounding differences, u really should submit the code
> if u didnt implement it, u should, instead of asking others, we really dont
> lack ideas but code & time to write it

It is very similar to the original code. And it is more than 4 times
faster.
Actually, it matches the original code results with this threshold :

static void hardthresh_c(DCTELEM dst[64], DCTELEM src[64], int qp, uint8_t *permutation){
    int i,t;
    int bias= 0; //FIXME
    uint32_t thresholdm[64];

    t=qp*((1<<4) - bias) - 1;
    thresholdm[0]=t;
    for (j=1;j<8;j++)
      thresholdm[j]=(int)rint(t / (cosl(j*acosl(-1.0)/(long double)16.0)*sqrtl(2)));
    for (i=1;i<8;i++) {
      thresholdm[i*8]=(int)rint(t / (cosl(i*acosl(-1.0)/(long double)16.0)*sqrtl(2)));
      for (j=1;j<8;j++) {
        thresholdm[i*8+j]=(int)rint(t /
                           ((cosl(i*acosl(-1.0)/(long double)16.0)*sqrtl(2))*
                           (cosl(j*acosl(-1.0)/(long double)16.0)*sqrtl(2))));
    } }

    memset(dst, 0, 64*sizeof(DCTELEM));
    dst[0]= (src[0] + 4)>>3;
    for(i=1; i<64; i++){
        int level= src[i];
        if(((unsigned)(level+thresholdm[i]))>2*thresholdm[i]){
            const int j= permutation[i];
            dst[j]= (level + 4)>>3;
        }
    }
}

(See the "Next, you can use implied ..." above)

For original threshold level of 100, it gives this threshold matrix:

100   72   77   85  100  127  185  362
 72   52   55   61   72   92  133  261
 77   55   59   65   77   97  141  277
 85   61   65   72   85  108  157  308
100   72   77   85  100  127  185  362
127   92   97  108  127  162  235  461
185  133  141  157  185  235  341  670
362  261  277  308  362  461  670 1314

Can somebody compare PSNR of this code and original flat threshold ?
Or even determine optimum matrix (which will be multiplied by
quantizer) PSNR-wise? This will be very interesting. Probably genetic
algorithms can help :). I practically can't do this myself.

-- 
Best regards,
 Nikolaj                          mailto:nialpof at pisem.net