[FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI

James Almer jamrial at gmail.com
Mon Nov 14 13:34:54 EET 2022


On 11/14/2022 2:58 AM, Wang, Bin wrote:
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of James Almer
> Sent: Monday, November 14, 2022 10:43 AM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI
> 
> On 11/4/2022 5:29 AM, bin.wang-at-intel.com at ffmpeg.org wrote:
>> +%macro FILTER_SOBEL 0
>> +%if UNIX64
>> +cglobal filter_sobel, 4, 15, 7, dst, width, matrix, ptr, c0, c1, c2,
>> +c3, c4, c5, c6, c7, c8, r, x %else cglobal filter_sobel, 4, 15, 7,
>> +dst, width, rdiv, bias, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, c7,
>> +c8, r, x %endif %if WIN64
>> +    SWAP xmm0, xmm2
>> +    SWAP xmm1, xmm3
>> +    mov  r2q, matrixmp
>> +    mov  r3q, ptrmp
>> +    DEFINE_ARGS dst, width, matrix, ptr, c0, c1, c2, c3, c4, c5, c6,
>> +c7, c8, r, x %endif
>> +    movsxdifnidn widthq, widthd
>> +    VBROADCASTSS m0, xmm0
>> +    VBROADCASTSS m1, xmm1
> 
>> + This and every other xmm# case should instead be xm#, to ensure the swapping is taken into account.
> 
> Sorry, I can't get your point, could you please help to explain why I have to use xm# to ensure the swapping operation(swap xmm# can't work in WIN64 asm)? And How to do it ?

SWAP only affects the x86inc defined macros m#, xm#, ym#, and zm#, so 
those instructions above end up encoded as vbroadcastss zmm2, xmm0 and
vbroadcastss zmm3, xmm1 on WIN64.
In fact, now that i check it they end up as vbroadcastss zmm18, xmm0 and 
vbroadcastss zmm19, xmm1 because x86inc is purposely using the higher 16 
regs with these macros on all targets to avoid having to call vzeroupper 
at the end. This works on unix64 by pure chance because the floats were 
effectively in xmm0 and xmm1 and all calculations then happen on m#, xm# 
and ym#.

So you'll have to duplicate the VBROADCASTSS lines to broadcast xmm2 and 
xmm3 to m0 and m1 on WIN64 instead of using SWAP.


More information about the ffmpeg-devel mailing list