[FFmpeg-devel] [PATCH] avfilter/vf_overlay: add x86 SIMD for yuv444 format when main stream has no alpha
Paul B Mahol
onemda at gmail.com
Mon Apr 30 21:57:04 EEST 2018
On 4/30/18, Henrik Gramner <henrik at gramner.com> wrote:
> On Mon, Apr 30, 2018 at 6:17 PM, Paul B Mahol <onemda at gmail.com> wrote:
>> + .loop0:
>> + movu m1, [dq + xq]
>> + movu m2, [aq + xq]
>> + movu m3, [sq + xq]
>> +
>> + pshufb m1, [pb_b2dw]
>> + pshufb m2, [pb_b2dw]
>> + pshufb m3, [pb_b2dw]
>> + mova m4, [pd_255]
>> + psubd m4, m2
>> + pmulld m1, m4
>> + pmulld m3, m2
>> + paddd m1, m3
>> + paddd m1, [pd_128]
>> + pmulld m1, [pd_257]
>> + psrad m1, 16
>> + pshufb m1, [pb_dw2b]
>> + movd [dq+xq], m1
>> + add xq, mmsize / 4
>
> Unpacking to dwords seems inefficient when you could do something like
> this (untested):
>
> mova m3, [pw_255]
> mova m4, [pw_128]
> mova m5, [pw_257]
> .loop0:
> pmovzxbw m0, [sq + xq]
> pmovzxbw m2, [aq + xq]
> pmovzxbw m1, [dq + xq]
> pmullw m0, m2
> pxor m2, m3
> pmullw m1, m2
> paddw m0, m4
> paddw m0, m1
> pmulhuw m0, m5
> packuswb m0, m0
> movq [dq+xq], m0
> add xq, mmsize / 2
Will experiment with this.
>
> which does twice as much per iteration. Also note that pmulld is slow
> on most CPUs.
This SIMD is not for CPUs found in museums.
>
>> + .loop1:
>> + xor tq, tq
>> + xor uq, uq
>> + xor vq, vq
>> + mov rd, 255
>> + mov tb, [aq + xq]
>> + neg tb
>> + add rb, tb
>> + mov ub, [sq + xq]
>> + neg tb
>> + imul ud, td
>> + mov vb, [dq + xq]
>> + imul rd, vd
>> + add rd, ud
>> + add rd, 128
>> + imul rd, 257
>> + sar rd, 16
>> + mov [dq + xq], rb
>> + add xq, 1
>> + cmp xq, wq
>> + jl .loop1
>
> Is doing the tail in scalar necessary? E.g. can you pad the buffers so
> that reading/writing past the end is OK and just run the SIMD loop?
Overlay does not operate that way, you can overlay 1 pixel onto hd720 frame.
Do you get it now?
>
> If that's impossible it'd probably be better to do a separate SIMD
> loop and pinsr/pextr input/output pixels depending on the number of
> elements left.
That seems too complicated.
More information about the ffmpeg-devel
mailing list