SwScaler performance help (was Re: [MPlayer-dev-eng] [PATCH] vf_osd updates - fully baked?)

Jason Tackaberry tack at sault.org
Tue Sep 13 04:34:58 CEST 2005


On Mon, 2005-09-12 at 11:59 -0400, Jason Tackaberry wrote:
> > As I mentioned before, since you have to seperate out the alpha in an
> > extra plane anyway, you can first do that and then scale. I think. Btw.
> > the swscaler can do the conversion and scaling in one step AFAIK.
> 
> It does.  I'll try to rework the code to use Swscaler.  I agree that
> it's just a better design that way.  I may have to ask for help. :)

Initial results are not very encouraging.  This approach, using
swscaler, is nearly 3 times slower than my current code.  My code will
convert a 640x480 BGRA image to 5 planes (luma, 2 chroma, luma alpha,
chroma alpha) in about 4200 usec.  Using swscaler to convert BGR32 to
YV12, then separating the alpha channel to a separate plane and using
swscaler to scale Y800 for luma and chroma alpha, this takes about 11500
usec.

Here's the code I'm using for swscaler.  In vf_config:

    sws_getFlagsAndFilterFromCmdLine(&sws_flags, &srcFilterParam,
&dstFilterParam);
    priv->sws_bgr32 = sws_getContext(priv->w, priv->h, IMGFMT_BGR32, width, height, IMGFMT_YV12,
                                       get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
                                       srcFilterParam, dstFilterParam, NULL);
    priv->sws_y800_l = sws_getContext(priv->w, priv->h, IMGFMT_Y800, width, height, IMGFMT_Y800,
                                       get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
                                       srcFilterParam, dstFilterParam, NULL);
    priv->sws_y800_c = sws_getContext(priv->w, priv->h, IMGFMT_Y800, width>>1, height>>1, IMGFMT_Y800,
                                       get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
                                       srcFilterParam, dstFilterParam, NULL);

Note that I'm testing with a fixed OSD, so that means priv->w == width
and priv->h == height.  (In other words, no scaling is happening except
for sws_y800_c.)

And for the conversion (it's messy, but it's just test code):

    unsigned char *alpha = malloc(priv->w*priv->h);
    int i, j;
    for (i=3, j=0; i < priv->w * priv->h * 4; i+=4, j++)
        alpha[j] = priv->bgra_imgbuf[i];
    {
    uint8_t *src[3] = {priv->bgra_imgbuf, NULL, NULL};
    int src_strides[3] = {priv->w * 4, 0, 0};
    uint8_t *dst[3] = {priv->y, priv->u, priv->v};
    int dst_strides[3] = {priv->mpi_w, priv->mpi_w>>1, priv->mpi_w>>1};
    sws_scale_ordered(priv->sws_bgr32, src, src_strides, 0, priv->h, dst, dst_strides);
    }
    uint8_t *src[3] = {alpha, NULL, NULL};
    int src_strides[3] = {priv->w, 0, 0};
    {
    uint8_t *dst[3] = {priv->a, NULL, NULL};
    int dst_strides[3] = {priv->w, 0, 0};
    sws_scale_ordered(priv->sws_y800_l, src, src_strides, 0, priv->h, dst, dst_strides);
    }
    {
    uint8_t *dst[3] = {priv->uva, NULL, NULL};
    int dst_strides[3] = {priv->w>>1, 0, 0};
    sws_scale_ordered(priv->sws_y800_c, src, src_strides, 0, priv->h, dst, dst_strides);
    }
    free(alpha);

(Note the malloc/free isn't being included in the timings since it should be moved elsewhere.)

Here's the info messages from swscaler:

        SwScaler: using unscaled Planar YV12 -> Planar YV12 special converter
        
        SwScaler: BICUBIC scaler, from Planar YV12 to Planar YV12 using MMX2
        
        SwScaler: BICUBIC scaler, from BGRA to Planar YV12 using MMX2
        SwScaler: using unscaled Planar Y800 -> Planar Y800 special converter
        
        SwScaler: BICUBIC scaler, from Planar Y800 to Planar Y800 using MMX2

(Note that I've aligned bgra_imgbuf.)

An increase from 4200 usec to 11500 usec is no small potatoes.  Am I
doing anything wrong?  I must be.  When I comment out the two last
scales and just do BGR32 to YV12, it's still slower (about 8000 usec).
I would have expected swscaler to be faster.

Cheers,
Jason.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20050912/1a930f5e/attachment.pgp>


More information about the MPlayer-dev-eng mailing list