[FFmpeg-devel] [PATCH 4/4] avfilter/vf_v360: x86 SIMD for interpolations
James Almer
jamrial at gmail.com
Wed Sep 4 23:56:22 EEST 2019
On 9/4/2019 5:47 PM, Henrik Gramner wrote:
> On Wed, Sep 4, 2019 at 10:01 PM James Almer <jamrial at gmail.com> wrote:
>> On 9/4/2019 4:28 PM, Paul B Mahol wrote:
>>> + vpmulld m3, m1, m0
>>> + vpaddd m1, m3, m2
>>
>> pmulld m1, m0
>> paddd m1, m2
>
> Could use pmaddwd instead as well, it's faster than pmulld on pretty
> much every CPU.
>
>>> + mova m2, m4
>>
>> Pointless mova. Just use m4 in the vpgatherdd below.
>
> No, it's required. Gathers overwrite the mask register.
Ah, my bad.
>
>>> + vpgatherdd m5, [srcq + m1], m2
>>> + vextracti128 xm3, m5, 1
>>> + vpshufb m1, m5, m6
>>> + vpshufb m2, m3, m6
>>
>> You could make these two pshufb use xmm regs, since you don't care
>> what's in the upper 128 bits.
>
> Or a single ymm pshufb before the vectracti128.
More information about the ffmpeg-devel
mailing list