[MPlayer-dev-eng] [PATCH] VF Overlay
Jason Tackaberry
tack at urandom.ca
Sat Aug 8 16:02:58 CEST 2009
On Sat, 2009-08-08 at 15:03 +0200, Benjamin Zores wrote:
> + for (i = priv->w * rh; i > 0; i--, p_bgr32 += 4, p_alpha++, p_bgr24 += 3) {
> + /* Moving 32 bits is faster than 3 separate assignments (or one 16
> + * bit and and one 8 bit move). The BGR24 buffer has one extra byte
> + * allocated to prevent an overrun.
> + */
> + memcpy(p_bgr24, p_bgr32, 3);
> + *p_alpha = p_bgr32[3];
> + }
The original code was fairly reasonably benchmarked by me, as this code
path is well travelled and should be as fast as possible. I suppose for
the sake of just getting it into MPlayer we can worry about this later,
quite a bit slower than even copying 3 bytes separately.
When you get a chance, have a look at the AV_WN32 macro as Reimar
suggested.
I guess it would look like:
AV_WN32(p_bgr24, p_bgr32);
*p_alpha = p_bgr32[3];
> + "movq %%mm0, (%0) \n\t"
> + : "+&r" (dst_byte), "+r" (dst_alpha)
> + : "r" (byte), "r" (alpha), "r" (global_alpha)
> + );
Sorry, I got this backwards in my last email: dst_byte is written after
all inputs read, whereas dst_alpha isn't, so early clobber should go on
dst_alpha. Reimar, is that right?
> +static void blend_plane_MMX(int w, int slice_h, uint8_t *dst, uint8_t *src,
[...]
> + : "+r" (dst), "+r" (src), "+r" (overlay), "+r" (alpha)
> + : "m" (wr)
Reimar pointed out that all the above "+r" might need to be changed to
"+&r" as well. Since wr is read last, after all outputs are written, I
assume that's true.
> +static int control(struct vf_instance_s *vf, int request, void *data)
> +{
> + return vf_next_control(vf, request, data);
> +}
Now that this function does nothing, you can just remove it entirely
(and the reference to it in open()).
Thanks,
Jason.
More information about the MPlayer-dev-eng
mailing list