[FFmpeg-devel] [PATCH 1/2] x86/vf_blend: add sse and ssse3 extremity functions
James Almer
jamrial at gmail.com
Wed Jun 28 02:46:58 EEST 2017
On 6/27/2017 8:19 PM, Ivan Kalvachev wrote:
> On 6/27/17, James Almer <jamrial at gmail.com> wrote:
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>> libavfilter/x86/vf_blend.asm | 25 +++++++++++++++++++++++++
>> libavfilter/x86/vf_blend_init.c | 4 ++++
>> tests/checkasm/vf_blend.c | 1 +
>> 3 files changed, 30 insertions(+)
>>
>> diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm
>> index 33b1ad1496..25f6f5affc 100644
>> --- a/libavfilter/x86/vf_blend.asm
>> +++ b/libavfilter/x86/vf_blend.asm
>> @@ -286,6 +286,31 @@ BLEND_INIT difference, 3
>> jl .loop
>> BLEND_END
>>
>> +BLEND_INIT extremity, 8
>> + pxor m2, m2
>> + mova m4, [pw_255]
>> +.nextrow:
>> + mov xq, widthq
>> +
>> + .loop:
>> + movu m0, [topq + xq]
>> + movu m1, [bottomq + xq]
>> + punpckhbw m5, m0, m2
>> + punpcklbw m0, m2
>> + punpckhbw m6, m1, m2
>> + punpcklbw m1, m2
>> + psubw m3, m4, m0
>> + psubw m7, m4, m5
>> + psubw m3, m1
>> + psubw m7, m6
>> + ABS1 m3, m1
>> + ABS1 m7, m6
>
> Minor nitpick.
>
> There exists ABS2 that takes 4 parameters and that does
> two interleaved ABS1 , that are (hopefully) faster on sse2.
> It should generate exactly the same code on ssse3.
Ah nice, pushed a change to use them. Thanks.
More information about the ffmpeg-devel
mailing list