[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER
skal
skal65535
Sun Nov 12 21:15:13 CET 2006
Hi Konstantin and all,
> On Fri, Nov 10, 2006 at 11:48:16PM +0100, skal wrote:
> > btw, while i have the mike:
> >
> > seems to me the following replacement functions for
> > vc1_v_overlap_c() and vc1_h_overlap_c() in vc1dsp.c:31
> > are likely to be faster (and bitwise equivalent of course)
> >
> > static void vc1_v_overlap_c(uint8_t* src, int stride, int rnd)
> > {
> > int i;
> > for(i = 0; i < 8; i++) {
> > const int a = src[-2*stride];
> > const int b = src[-stride];
> > const int c = src[0];
> > const int d = src[stride];
> > const int d1 = ( a-d + 3 + rnd ) >> 3;
> > const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
> > src[-2*stride] = clip_uint8(a-d1);
> > src[-stride] = clip_uint8(b+d2);
> > src[0] = clip_uint8(c-d2);
> > src[stride] = clip_uint8(d+d1);
> > src++;
> > }
> > }
> >
> > but i might of course be wrong...
>
> They are almost correct (it should be read 'b-d2' and 'c+d2' instead)
oh! you're right. Typo.
> - except the rounding:
> original:
> 4-rnd
> 3+rnd
> 4-rnd
> 3+rnd
> yours:
> -3-rnd
> -4-rnd
> 4+rnd
> 3+rnd
hmm... i don't think so. The minus sign ("-d1") has its importance here.
Btw, it's pretty obvious new values for 'a' and 'd' don't need [0..255] clipping
since the kernel only has positive coeffs.
And it's also obvious no update is needed if d1 or d2 are null.
e.g. =>
static void vc1_v_overlap_c(uint8_t* src, int stride, int rnd)
{
int i;
for(i = 0; i < 8; i++) {
const int a = src[-2*stride];
const int b = src[-stride];
const int c = src[0];
const int d = src[stride];
const int d1 = ( a-d + 3 + rnd ) >> 3;
const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
if (d1) {
src[-2*stride] = a-d1;
src[stride] = d+d1;
}
if (d2) {
src[-stride] = clip_uint8(b-d2);
src[0] = clip_uint8(c+d2);
}
src++;
}
}
bye!
Skal
for the record, let's be pragmatic:
void Test_Overlap()
{
int rnd, a,b,c,d;
for(rnd=0; rnd<=1; ++rnd) {
for(a=0; a<256; ++a) {
for(b=0; b<256; ++b) {
for(c=b; c<256; ++c) {
for(d=a; d<256; ++d) {
const int v1 = (7*a + d + 4 - rnd) >> 3;
const int v2 = (-a + 7*b + c + d + 3 + rnd) >> 3;
const int v3 = (a + b + 7*c - d + 4 - rnd) >> 3;
const int v4 = (a + 7*d + 3 + rnd) >> 3;
const int d1 = ( a-d + 3 + rnd ) >> 3;
const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
const int w1 = a-d1;
const int w2 = b-d2;
const int w3 = c+d2;
const int w4 = d+d1;
assert(v1==w1);
assert(v2==w2);
assert(v3==w3);
assert(v4==w4);
}
}
}
printf(".");
}
}
}
More information about the ffmpeg-devel
mailing list