[FFmpeg-devel] [PATCH] avfilter/vf_blend: add x86 SIMD for some modes

Paul B Mahol onemda at gmail.com
Fri Oct 2 21:03:06 CEST 2015


On 10/2/15, Henrik Gramner <henrik at gramner.com> wrote:
> On Fri, Oct 2, 2015 at 6:57 PM, Paul B Mahol <onemda at gmail.com> wrote:
>> +INIT_XMM sse2
>> +cglobal blend_xor, 9, 10, 2, 0, top, top_linesize, bottom,
>> bottom_linesize, dst, dst_linesize, width, start, end
> [...]
>> +cglobal blend_or, 9, 10, 2, 0, top, top_linesize, bottom,
>> bottom_linesize, dst, dst_linesize, width, start, end
> [...]
>> +cglobal blend_and, 9, 10, 2, 0, top, top_linesize, bottom,
>> bottom_linesize, dst, dst_linesize, width, start, end
>
> You could do those using floating point operations (xorps, orps,
> andps), then you only need SSE instead of SSE2 (and AVX instead of
> AVX2 if you want to make versions using ymm registers).
>
>> +cglobal blend_addition, 9, 10, 3, 0, top, top_linesize, bottom,
>> bottom_linesize, dst, dst_linesize, width, start, end
> [...]
>> +        punpcklbw       m0, m2
>> +        punpcklbw       m1, m2
>> +        paddw           m0, m1
>> +        packuswb        m0, m0
>> +        movh    [dstq + x], m0
>> +        add           r10q, mmsize / 2
>
> paddusb
>

fixed locally.

>> +cglobal blend_subtract, 9, 10, 3, 0, top, top_linesize, bottom,
>> bottom_linesize, dst, dst_linesize, width, start, end
> [...]
>> +        punpcklbw       m0, m2
>> +        punpcklbw       m1, m2
>> +        psubw           m0, m1
>> +        packuswb        m0, m0
>
> psubusb

fixed locally.

>
>> +cglobal blend_darken, 9, 10, 2, 0, top, top_linesize, bottom,
>> bottom_linesize, dst, dst_linesize, width, start, end
> [...]
>> +        movh            m0, [topq + x]
>> +        movh            m1, [bottomq + x]
>> +        pminub          m0, m1
>> +        movh    [dstq + x], m0
> [...]
>> +cglobal blend_lighten, 9, 10, 2, 0, top, top_linesize, bottom,
>> bottom_linesize, dst, dst_linesize, width, start, end
> [...]
>> +        movh            m0, [topq + x]
>> +        movh            m1, [bottomq + x]
>> +        pmaxub          m0, m1
>> +        movh    [dstq + x], m0
>
> You're only utilizing the lower half the registers here.

fixed locally.

> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>


More information about the ffmpeg-devel mailing list