[FFmpeg-devel] [PATCH 3/3] avfilter/vf_framerate: add SIMD functions for frame blending
Marton Balint
cus at passwd.hu
Mon Jan 15 01:09:46 EET 2018
On Sun, 14 Jan 2018, Henrik Gramner wrote:
> On Sat, Jan 13, 2018 at 10:57 PM, Marton Balint <cus at passwd.hu> wrote:
>> + .loop:
>> + movu m0, [src1q + xq]
>> + movu m1, [src2q + xq]
>> + punpckl%1%2 m5, m0, m2 ; 0e0f0g0h
>> + punpckh%1%2 m0, m2 ; 0a0b0c0d
>> + punpckl%1%2 m6, m1, m2 ; 0E0F0G0H
>> + punpckh%1%2 m1, m2 ; 0A0B0C0D
>> + pmull%2 m0, m3
>> + pmull%2 m5, m3
>> + pmull%2 m1, m4
>> + pmull%2 m6, m4
>> + padd%2 m0, m7
>> + padd%2 m5, m7
>> + padd%2 m0, m1
>> + padd%2 m5, m6
>
> pmaddubsw should work here for the 8-bit case. pmaddwd might work for
> the 16-bit case depending on how many bits are actually used.
>
As far as I see, I have to make the blending factors 7-bit (15-bit) in
order for this to work because pmadd* functions are working on signed
integers. Losing 1 bit of precision of the blending factors is
probably not a problem for the framerate filter.
So my loop would look like this:
.loop:
movu m0, [src1q + xq]
movu m1, [src2q + xq]
SBUTTERFLY %1%2, 0, 1, 5 ; aAbBcCdD
; eEfFgGhH
pmadd%3 m0, m3
pmadd%3 m1, m3
padd%2 m0, m7
padd%2 m1, m7
psrl%2 m0, %4 ; 0A0B0C0D
psrl%2 m1, %4 ; 0E0F0G0H
packus%2%1 m0, m1 ; ABCDEFGH
movu [dstq + xq], m0
add xq, mmsize
jl .loop
Is this what you had in mind?
>> + pinsrw xm3, r8m, 0 ; factor1
>> + pinsrw xm4, r9m, 0 ; factor2
>> + pinsrw xm7, r10m, 0 ; half
>> + SPLATW m3, xm3
>> + SPLATW m4, xm4
>> + SPLATW m7, xm7
>
> vpbroadcast* from memory on avx2, otherwise movd instead of pxor+pinsrw.
>
>> + pxor m3, m3
>> + pxor m4, m4
>> + pxor m7, m7
>> + pinsrw xm3, r8m, 0 ; factor1
>> + pinsrw xm4, r9m, 0 ; factor2
>> + pinsrw xm7, r10m, 0 ; half
>> + XSPLATD 3
>> + XSPLATD 4
>> + XSPLATD 7
>
> Ditto.
>
>> + neg word r11m ; shift = -shift
>> + add word r11m, 16 ; shift += 16
>> + pxor m2, m2
>> + pinsrw xm2, r11m, 0 ; 16 - shift
>> + pslld m3, xm2
>> + pslld m4, xm2
>> + pslld m7, xm2
>
> You probably want to use a temporary register instead of doing slow
> load-modify-store instructions.
Ok, I will rework these, although these parts are only the initialization
code, so I guess these are not performance critical.
Thanks,
Marton
More information about the ffmpeg-devel
mailing list