[FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI

James Almer jamrial at gmail.com
Mon Nov 14 16:31:49 EET 2022


On 11/14/2022 10:54 AM, Wang, Bin wrote:
> 
> 
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of James
>> Almer
>> Sent: Monday, November 14, 2022 9:36 PM
>> To: ffmpeg-devel at ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add
>> sobel filter optimization and unit test with intel AVX512 VNNI
>>
>> On 11/14/2022 10:30 AM, Wang, Bin wrote:
>>>> By using xmm# you're not taking into account any x86inc SWAPing, so
>>>> this is using xmm0 and xmm1 where the single scalar float input
>>>> arguments reside (at least on unix64), instead of xm0 and xm1 (xmm16
>>>> and xmm17) where the broadcasted scalars were stored.
>>>> This, again, only worked by chance on unix64 because you're using
>>>> scalar fmadd, and shouldn't work at all on win64.
>>>>
>>>> Also, all these as is are being encoded as VEX, not EVEX, but it
>>>> should be fine leaving them untouched instead of using xm#, since
>>>> they will be shorter (five bytes instead of six for some) by using the lower,
>> non callee-saved regs.
>>>
>>> Thanks for the help. I'm not familiar with WIN64 asm. So what I need to do is
>> change the WIN64 swap from:
>>> SWAP xmm0, xmm2
>>> SWAP xmm1, xmm3
>>> To:
>>> VBROADCASTSS m0, xmm2
>>> VBROADCASTSS m1, xmm3
>>>
>>> Is that correct?
>>
>> Yes, that will ultimately broadcast the two scalars in xmm2 and xmm3 to
>> zmm16 and zmm17.
>> After that what you need to do is either change the fmaddss instruction to use
>> xm0 and xm1 macros instead of xmm0 and xmm1 (so xmm16 and xmm17 with
>> EVEX encoding is used), or much like the broadcast above use xmm2 and xmm3
>> explicitly on win64, so it remains VEX encoded.
> 
> So, to fix the issue, does this 2 changes looks good for you?
> First change the WIN64 swap from:
> SWAP xmm0, xmm2
> SWAP xmm1, xmm3
> To:
> VBROADCASTSS m0, xmm2
> VBROADCASTSS m1, xmm3
> 
> Second change the fmaddss from:
> fmaddss   xmm4, xmm4, xmm0, xmm1
> To:
> fmaddss   xmm4, xmm4, xm0, xm1

Yes.


More information about the ffmpeg-devel mailing list