[Ffmpeg-devel] [RFC] VC1 Transform in AltiVec
Kostya
kostya.shishkov
Wed Jul 19 06:23:59 CEST 2006
On Tue, Jul 18, 2006 at 12:05:58PM +0200, Michael Niedermayer wrote:
> Hi
>
> On Tue, Jul 18, 2006 at 06:46:23AM +0300, Kostya wrote:
> > Here is my first attept to optimize something with processor-specific instructions.
> > A patch to vc1.c provided.
> >
> > Please note that:
> > a) It is AltiVec-only, so don't try to compile on x86 or machine without AltiVec support
> > b) It's just a hack to demonstrate it works, in future this will go to ppc/vc1_altivec.c
> >
> > TRANSPOSE8() macro was taken from ppc/mpegvideo_altivec.c
> >
> > I'd like to hear from people who know this stuff if I took the right approach (and further
> > suggestions of optimization).
> >
> > MMX version will follow.
>
> > --- vc1_svn.c 2006-07-16 07:47:53.000000000 +0300
> > +++ vc1.c 2006-07-17 19:09:12.000000000 +0300
> > @@ -716,6 +716,192 @@
> > return 0;
> > }
> >
> > +#define TRANSPOSE8(a,b,c,d,e,f,g,h) \
> > +do { \
> > + __typeof__(a) _A1, _B1, _C1, _D1, _E1, _F1, _G1, _H1; \
> > + __typeof__(a) _A2, _B2, _C2, _D2, _E2, _F2, _G2, _H2; \
>
> stuff beginning with _ is reserved in C ...
As I stated that's not my code. And looks like it is used to declare variables with the same type
as macro arguments.
>
[...]
>
> > + ssrc7 = vec_ld(112, block);
> > +
> > + TRANSPOSE8(ssrc0, ssrc1, ssrc2, ssrc3, ssrc4, ssrc5, ssrc6, ssrc7);
>
> the TRANSPOSE is unneeded, the scantables can be transposed to get the same
> effect
I'm not sure about this. Looks like to be the simplest way to do horizontal
transform with AltiVec.
>
>
[...]
> > +
> > + STEP8(s0, s1, s2, s3, s4, s5, s6, s7, vec_4);
> > + SHIFT_HOR(s0, s1, s2, s3, s4, s5, s6, s7);
> > + STEP8(s8, s9, sA, sB, sC, sD, sE, sF, vec_4);
> > + SHIFT_HOR(s8, s9, sA, sB, sC, sD, sE, sF);
>
> the horizontal transform fits in 16bit as is so no unpack/pack is needed
Oh, that's nice.
[...]
> > + sA = vec_unpackh(ssrc2);
> > + sB = vec_unpackh(ssrc3);
> > + sC = vec_unpackh(ssrc4);
> > + sD = vec_unpackh(ssrc5);
> > + sE = vec_unpackh(ssrc6);
> > + sF = vec_unpackh(ssrc7);
> > + STEP8(s0, s1, s2, s3, s4, s5, s6, s7, vec_4);
> > + SHIFT_VERT(s0, s1, s2, s3, s4, s5, s6, s7);
> > + STEP8(s8, s9, sA, sB, sC, sD, sE, sF, vec_4);
> > + SHIFT_VERT(s8, s9, sA, sB, sC, sD, sE, sF);
>
> the vertical transform can also be done in 16bit though its a little trickier
>
> t1 = 6 * (src[ 0] + src[32]);
> t2 = 6 * (src[ 0] - src[32]);
> t3 = 8 * src[16] + 3 * src[48];
> t4 = 3 * src[16] - 8 * src[48];
>
> t5 = t1 + t3;
> t6 = t2 + t4;
> t7 = t2 - t4;
> t8 = t1 - t3;
>
> t1 = (8 * src[ 8] + 8 * src[24] + 4 * src[40] + 2 * src[56]) + ((- src[24] + src[40])>>1);
> t2 = (8 * src[ 8] - 2 * src[24] - 8 * src[40] - 4 * src[56]) + ((- src[ 8] - src[56])>>1);
> t3 = (4 * src[ 8] - 8 * src[24] + 2 * src[40] + 8 * src[56]) + (( src[ 8] - src[56])>>1);
> t4 = (2 * src[ 8] - 4 * src[24] + 8 * src[40] - 8 * src[56]) + ((- src[24] - src[40])>>1);
>
> dst[ 0] = (t5 + t1 + 32) >> 6;
> dst[ 8] = (t6 + t2 + 32) >> 6;
> dst[16] = (t7 + t3 + 32) >> 6;
> dst[24] = (t8 + t4 + 32) >> 6;
> dst[32] = (t8 - t4 + 32) >> 6;
> dst[40] = (t7 - t3 + 32) >> 6;
> dst[48] = (t6 - t2 + 32) >> 6;
> dst[56] = (t5 - t1 + 32) >> 6;
>
> its also interresting to note that microsoft must be aware of this due to the
> way rounding is done on the second half of coeffs but they apparently
> dont mention it in the spec ... i am wondering what other stuff they have
> hidden ...
>
> and the + 32 can be added to t1/t2 instead of the end
Well, here is my version converted back to C:
t1 = ((src[0] + src[4]) << 2) * 3 + 4;
t2 = ((src[0] - src[4]) << 2) * 3 + 4;
t3 = ((src[6] * 3) << 1) + (src[2] << 4);
t4 = ((src[2] * 3) << 1) - (src[6] << 4);
t5 = t1 + t3;
t6 = t2 + t4;
t7 = t2 - t4;
t8 = t1 - t3;
// t1 = 16 * src[1] + 15 * src[3] + 9 * src[5] + 4 * src[7]
t1 = ((((((src[1] + src[3]) << 1) + src[5]) << 1) + src[7]) << 2) + src[5] - src[3];
...etc
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> In the past you could go to a library and read, borrow or copy any book
> Today you'd get arrested for mere telling someone where the library is
>
More information about the ffmpeg-devel
mailing list