[FFmpeg-devel] [PATCH 3/4] libswscale/x86/rgb2rgb: add uyvytoyuv422 avx2
Wu, Jianhua
jianhua.wu at intel.com
Tue Sep 28 10:13:20 EEST 2021
Min Chen wrote:
>
> The current algoithm may get improve, may you combin these optimize with
> your patches? since extra VPERM make code a little more slower.
>
>
>
> On Haswell
> Current alogithm:
> RSHIFT_COPY m6, m2, 1 ; UYVY UYVY -> YVYU YVY...
> pand m6, m1; YxYx YxYx... RSHIFT_COPY m7, m3, 1 ; UYVY UYVY -> YVYU YVY...
> pand m7, m1 ; YxYx YxYx... packuswb m6, m7 ; YYYY YYYY...
>
>
> Latency:
> 1 + 1 + 1 + 1 + 1 = 5
>
>
> Proposed:
> pshufb m6, m2, mX ; UYVY UYVY -> xxxx YYYY pshufb m7, m3, mX
> punpcklqdq m6, m7 ; YYYY YYYY
>
>
> Latency:
> 1 + 1 + 1 = 3
>
>
> I guess the current algorithm optimize for compatible with SSE2, because
> PSHUFB addition since SSSE3.
> Now, we try to optimzie with AVX, AVX2 and AVX512, so I suggest we use
> proposed algorithm to get more performance.
>
>
> Regards,
> Min Chen
>
Hi Min Chen,
Thanks for the careful review. You're right.
Using the specific functionalities added in AVX2/512 should be better. I'll try
your proposal and see if it has a better performance. If so, I'll resubmit the new patches.
Best regards,
Jianhua
More information about the ffmpeg-devel
mailing list