[FFmpeg-devel] [PATCH] avfilter/vf_blend: add x86 SIMD for some modes
Henrik Gramner
henrik at gramner.com
Fri Oct 2 19:48:24 CEST 2015
On Fri, Oct 2, 2015 at 6:57 PM, Paul B Mahol <onemda at gmail.com> wrote:
> +INIT_XMM sse2
> +cglobal blend_xor, 9, 10, 2, 0, top, top_linesize, bottom, bottom_linesize, dst, dst_linesize, width, start, end
[...]
> +cglobal blend_or, 9, 10, 2, 0, top, top_linesize, bottom, bottom_linesize, dst, dst_linesize, width, start, end
[...]
> +cglobal blend_and, 9, 10, 2, 0, top, top_linesize, bottom, bottom_linesize, dst, dst_linesize, width, start, end
You could do those using floating point operations (xorps, orps,
andps), then you only need SSE instead of SSE2 (and AVX instead of
AVX2 if you want to make versions using ymm registers).
> +cglobal blend_addition, 9, 10, 3, 0, top, top_linesize, bottom, bottom_linesize, dst, dst_linesize, width, start, end
[...]
> + punpcklbw m0, m2
> + punpcklbw m1, m2
> + paddw m0, m1
> + packuswb m0, m0
> + movh [dstq + x], m0
> + add r10q, mmsize / 2
paddusb
> +cglobal blend_subtract, 9, 10, 3, 0, top, top_linesize, bottom, bottom_linesize, dst, dst_linesize, width, start, end
[...]
> + punpcklbw m0, m2
> + punpcklbw m1, m2
> + psubw m0, m1
> + packuswb m0, m0
psubusb
> +cglobal blend_darken, 9, 10, 2, 0, top, top_linesize, bottom, bottom_linesize, dst, dst_linesize, width, start, end
[...]
> + movh m0, [topq + x]
> + movh m1, [bottomq + x]
> + pminub m0, m1
> + movh [dstq + x], m0
[...]
> +cglobal blend_lighten, 9, 10, 2, 0, top, top_linesize, bottom, bottom_linesize, dst, dst_linesize, width, start, end
[...]
> + movh m0, [topq + x]
> + movh m1, [bottomq + x]
> + pmaxub m0, m1
> + movh [dstq + x], m0
You're only utilizing the lower half the registers here.
More information about the ffmpeg-devel
mailing list