[FFmpeg-devel] [RFC] Alpha support
Michael Niedermayer
michaelni
Sat Jan 31 21:10:41 CET 2009
On Wed, Jan 28, 2009 at 09:10:23PM +0100, C?dric Schieli wrote:
> Hello,
>
> Attached are two versions of the patch :
> > sws_use_alpha_bloated.patch ensures that no extra code is reached for the
> > non alpha case, at the cost of more bloated code (swscale.o is 9kB bigger in
> > my build)
> > sws_use_alpha_slower.patch only eliminates extra code when the destination
> > format is not alpha capable, and uses a runtime test for the remaining cases
> >
>
> Attached version of sws_use_alpha.patch combines those two cases. The
> "bloated" case is the default, the "slower" one is selected with
> --enable-small
>
> The cloberring of %ebp in yuv2packed2 has also been addressed.
changelog entry missing
[...]
> Index: ffmpeg/libswscale/swscale.c
> ===================================================================
> --- ffmpeg.orig/libswscale/swscale.c 2009-01-28 21:08:48.712047316 +0100
> +++ ffmpeg/libswscale/swscale.c 2009-01-28 21:08:53.627047128 +0100
> @@ -551,13 +551,14 @@
> }
> }
>
> -#define YSCALE_YUV_2_PACKEDX_NOCLIP_C(type) \
> +#define YSCALE_YUV_2_PACKEDX_NOCLIP_C(type,alpha) \
> for (i=0; i<(dstW>>1); i++){\
> int j;\
> int Y1 = 1<<18;\
> int Y2 = 1<<18;\
> int U = 1<<18;\
> int V = 1<<18;\
> + int av_unused A1, A2;\
> type av_unused *r, *b, *g;\
> const int i2= 2*i;\
> \
> @@ -575,9 +576,21 @@
> Y2>>=19;\
> U >>=19;\
> V >>=19;\
> + if (alpha && (!CONFIG_SMALL || c->alpPixBuf))\
> + {\
{
placement is inconsistant and not K&R style (yes i know sws is
somewhat messy in terms of following any common style, but new code should be
approximately K&R style, doesnt need to follow all fineprint but the {} should
be placed in K&R style)
also this shouldnt be using c->alpPixBuf not in either case because
the c-> needs an pointer dereference (unless the compiler is smart but
experience says the compiler generally is not)
> + A1 = 1<<18;\
> + A2 = 1<<18;\
> + for (j=0; j<lumFilterSize; j++)\
> + {\
> + A1 += alpSrc[j][i2 ] * lumFilter[j];\
> + A2 += alpSrc[j][i2+1] * lumFilter[j];\
> + }\
> + A1>>=19;\
> + A2>>=19;\
> + }\
the int A1,A2 can be moved in the if()
>
> -#define YSCALE_YUV_2_PACKEDX_C(type) \
> - YSCALE_YUV_2_PACKEDX_NOCLIP_C(type)\
> +#define YSCALE_YUV_2_PACKEDX_C(type,alpha) \
> + YSCALE_YUV_2_PACKEDX_NOCLIP_C(type,alpha)\
> if ((Y1|Y2|U|V)&256)\
> {\
> if (Y1>255) Y1=255; \
> @@ -588,14 +601,22 @@
> else if (U<0) U=0; \
> if (V>255) V=255; \
> else if (V<0) V=0; \
> + }\
> + if (alpha && (!CONFIG_SMALL || c->alpPixBuf) && ((A1|A2)&256))\
> + {\
> + if (A1>255) A1=255; \
> + else if (A1<0)A1=0; \
> + if (A2>255) A2=255; \
> + else if (A2<0)A2=0; \
av_clip_uint8()
[...]
> @@ -625,6 +653,11 @@
> if (B>=(256<<22)) B=(256<<22)-1; \
> else if (B<0)B=0; \
> }\
> + if (alpha && (!CONFIG_SMALL || c->alpPixBuf) && (A&0xC0000000))\
> + {\
> + if (A>=(256<<22)) A=(256<<22)-1; \
> + else if (A<0)A=0; \
these 2 can be verically aligned prettier
[...]
> - case PIX_FMT_RGB32:\
> - case PIX_FMT_BGR32:\
> - case PIX_FMT_RGB32_1:\
> - case PIX_FMT_BGR32_1:\
> - func(uint32_t)\
> - ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1];\
> - ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2];\
> - } \
> + case PIX_FMT_RGBA:\
> + case PIX_FMT_BGRA:\
> + if (!CONFIG_SMALL && c->alpPixBuf)\
> + {\
> + func(uint32_t,1)\
> + ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] + (A1<<24);\
> + ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] + (A2<<24);\
> + }\
> + }else{\
> + func(uint32_t,CONFIG_SMALL)\
> + ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] + (CONFIG_SMALL ? (A1<<24) : 0);\
> + ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] + (CONFIG_SMALL ? (A2<<24) : 0);\
> + }\
> + }\
> + break;\
> + case PIX_FMT_ARGB:\
> + case PIX_FMT_ABGR:\
is it faster the way you wrote it compared to a table that does <<24 vs. <<0 ?
iam asking because the table would lead to simpler and less duplicated code
[...]
> - YSCALE_YUV_2_RGBX_FULL_C(1<<21)
> - dest[aidx]= 0;
> + if (!CONFIG_SMALL && c->alpPixBuf){
> + YSCALE_YUV_2_RGBX_FULL_C(1<<21,1)
> + dest[aidx]= A>>22;
> + dest[0]= B>>22;
> + dest[1]= G>>22;
> + dest[2]= R>>22;
> + dest+= step;
> + }
> + }else{
> + YSCALE_YUV_2_RGBX_FULL_C(1<<21,CONFIG_SMALL)
> + dest[aidx]= CONFIG_SMALL ? A>>22 : 0;
> dest[0]= B>>22;
> dest[1]= G>>22;
> dest[2]= R>>22;
> dest+= step;
> }
> + }
Why dont you write it like
+ if (!CONFIG_SMALL && c->alpPixBuf){
+ YSCALE_YUV_2_RGBX_FULL_C(1<<21,1)
+ dest[aidx]= A>>22;
+ dest[0]= B>>22;
+ dest[1]= G>>22;
+ dest[2]= R>>22;
+ dest+= step;
+ }
+ }else{
+ YSCALE_YUV_2_RGBX_FULL_C(1<<21,CONFIG_SMALL ? c->alpPixBuf : 0)
+ dest[aidx]= CONFIG_SMALL ? A>>22 : 0;
dest[0]= B>>22;
dest[1]= G>>22;
dest[2]= R>>22;
dest+= step;
}
+ }
and
+#define YSCALE_YUV_2_RGBX_FULL_C(rnd,alpha) \
+ YSCALE_YUV_2_PACKEDX_FULL_C(alpha)\
Y-= c->yuv2rgb_y_offset;\
Y*= c->yuv2rgb_y_coeff;\
Y+= rnd;\
@@ -625,6 +653,11 @@
if (B>=(256<<22)) B=(256<<22)-1; \
else if (B<0)B=0; \
}\
+ if (alpha)\
?
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090131/c5de9576/attachment.pgp>
More information about the ffmpeg-devel
mailing list