[FFmpeg-devel] [RFC] Alpha support

Sat Jan 31 21:10:41 CET 2009

On Wed, Jan 28, 2009 at 09:10:23PM +0100, C?dric Schieli wrote:
> Hello,
> 
> Attached are two versions of the patch :
> > sws_use_alpha_bloated.patch ensures that no extra code is reached for the
> > non alpha case, at the cost of more bloated code (swscale.o is 9kB bigger in
> > my build)
> > sws_use_alpha_slower.patch only eliminates extra code when the destination
> > format is not alpha capable, and uses a runtime test for the remaining cases
> >
> 
> Attached version of sws_use_alpha.patch combines those two cases. The
> "bloated" case is the default, the "slower" one is selected with
> --enable-small
> 
> The cloberring of %ebp in yuv2packed2 has also been addressed.


changelog entry missing

[...]

> Index: ffmpeg/libswscale/swscale.c
> ===================================================================
> --- ffmpeg.orig/libswscale/swscale.c	2009-01-28 21:08:48.712047316 +0100
> +++ ffmpeg/libswscale/swscale.c	2009-01-28 21:08:53.627047128 +0100
> @@ -551,13 +551,14 @@
>          }
>  }
>  
> -#define YSCALE_YUV_2_PACKEDX_NOCLIP_C(type) \
> +#define YSCALE_YUV_2_PACKEDX_NOCLIP_C(type,alpha) \
>      for (i=0; i<(dstW>>1); i++){\
>          int j;\
>          int Y1 = 1<<18;\
>          int Y2 = 1<<18;\
>          int U  = 1<<18;\
>          int V  = 1<<18;\
> +        int av_unused A1, A2;\
>          type av_unused *r, *b, *g;\
>          const int i2= 2*i;\
>          \
> @@ -575,9 +576,21 @@
>          Y2>>=19;\
>          U >>=19;\
>          V >>=19;\
> +        if (alpha && (!CONFIG_SMALL || c->alpPixBuf))\
> +        {\

{ 
placement is inconsistant and not K&R style (yes i know sws is
somewhat messy in terms of following any common style, but new code should be
approximately K&R style, doesnt need to follow all fineprint but the {} should
be placed in K&R style)

also this shouldnt be using c->alpPixBuf not in either case because
the c-> needs an pointer dereference (unless the compiler is smart but
experience says the compiler generally is not)


> +            A1 = 1<<18;\
> +            A2 = 1<<18;\
> +            for (j=0; j<lumFilterSize; j++)\
> +            {\
> +                A1 += alpSrc[j][i2  ] * lumFilter[j];\
> +                A2 += alpSrc[j][i2+1] * lumFilter[j];\
> +            }\
> +            A1>>=19;\
> +            A2>>=19;\
> +        }\

the int A1,A2 can be moved in the if()


>  
> -#define YSCALE_YUV_2_PACKEDX_C(type) \
> -        YSCALE_YUV_2_PACKEDX_NOCLIP_C(type)\
> +#define YSCALE_YUV_2_PACKEDX_C(type,alpha) \
> +        YSCALE_YUV_2_PACKEDX_NOCLIP_C(type,alpha)\
>          if ((Y1|Y2|U|V)&256)\
>          {\
>              if (Y1>255)   Y1=255; \

> @@ -588,14 +601,22 @@
>              else if (U<0) U=0;    \
>              if (V>255)    V=255;  \
>              else if (V<0) V=0;    \
> +        }\
> +        if (alpha && (!CONFIG_SMALL || c->alpPixBuf) && ((A1|A2)&256))\
> +        {\

> +            if (A1>255)   A1=255; \
> +            else if (A1<0)A1=0;   \
> +            if (A2>255)   A2=255; \
> +            else if (A2<0)A2=0;   \

av_clip_uint8()


[...]
> @@ -625,6 +653,11 @@
>              if (B>=(256<<22))   B=(256<<22)-1; \
>              else if (B<0)B=0;   \
>          }\
> +        if (alpha && (!CONFIG_SMALL || c->alpPixBuf) && (A&0xC0000000))\
> +        {\
> +            if (A>=(256<<22))   A=(256<<22)-1; \
> +            else if (A<0)A=0;   \

these 2 can be verically aligned prettier


[...]
> -    case PIX_FMT_RGB32:\
> -    case PIX_FMT_BGR32:\
> -    case PIX_FMT_RGB32_1:\
> -    case PIX_FMT_BGR32_1:\
> -        func(uint32_t)\
> -            ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1];\
> -            ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2];\
> -        }                \
> +    case PIX_FMT_RGBA:\
> +    case PIX_FMT_BGRA:\
> +        if (!CONFIG_SMALL && c->alpPixBuf)\
> +        {\
> +            func(uint32_t,1)\
> +                ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] + (A1<<24);\
> +                ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] + (A2<<24);\
> +            }\
> +        }else{\
> +            func(uint32_t,CONFIG_SMALL)\
> +                ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] + (CONFIG_SMALL ? (A1<<24) : 0);\
> +                ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] + (CONFIG_SMALL ? (A2<<24) : 0);\
> +            }\
> +        }\
> +        break;\
> +    case PIX_FMT_ARGB:\
> +    case PIX_FMT_ABGR:\

is it faster the way you wrote it compared to a table that does <<24 vs. <<0 ?
iam asking because the table would lead to simpler and less duplicated code


[...]
> -        YSCALE_YUV_2_RGBX_FULL_C(1<<21)
> -            dest[aidx]= 0;
> +        if (!CONFIG_SMALL && c->alpPixBuf){
> +            YSCALE_YUV_2_RGBX_FULL_C(1<<21,1)
> +                dest[aidx]= A>>22;
> +                dest[0]= B>>22;
> +                dest[1]= G>>22;
> +                dest[2]= R>>22;
> +                dest+= step;
> +            }
> +        }else{
> +        YSCALE_YUV_2_RGBX_FULL_C(1<<21,CONFIG_SMALL)
> +            dest[aidx]= CONFIG_SMALL ? A>>22 : 0;
>              dest[0]= B>>22;
>              dest[1]= G>>22;
>              dest[2]= R>>22;
>              dest+= step;
>          }
> +        }

Why dont you write it like

+        if (!CONFIG_SMALL && c->alpPixBuf){
+            YSCALE_YUV_2_RGBX_FULL_C(1<<21,1)
+                dest[aidx]= A>>22;
+                dest[0]= B>>22;
+                dest[1]= G>>22;
+                dest[2]= R>>22;
+                dest+= step;
+            }
+        }else{
+        YSCALE_YUV_2_RGBX_FULL_C(1<<21,CONFIG_SMALL ? c->alpPixBuf : 0)
+            dest[aidx]= CONFIG_SMALL ? A>>22 : 0;
             dest[0]= B>>22;
             dest[1]= G>>22;
             dest[2]= R>>22;
             dest+= step;
         }
+        }

and

+#define YSCALE_YUV_2_RGBX_FULL_C(rnd,alpha) \
+    YSCALE_YUV_2_PACKEDX_FULL_C(alpha)\
         Y-= c->yuv2rgb_y_offset;\
         Y*= c->yuv2rgb_y_coeff;\
         Y+= rnd;\
@@ -625,6 +653,11 @@
             if (B>=(256<<22))   B=(256<<22)-1; \
             else if (B<0)B=0;   \
         }\
+        if (alpha)\

?


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090131/c5de9576/attachment.pgp>