[FFmpeg-devel] [PATCH v2 3/4] libswscale/x86/rgb2rgb: add uyvytoyuv422 avx2

Thu Sep 30 05:29:18 EEST 2021

Hello,

>+pb_shuffle_low: times 4 db 1, 3, 5, 7, 9, 11, 13, 15, -1, -1, -1, -1, -1, -1, -1, -1
Why we times 4?
AVX2 provided instruction VPBROADCASTQ to load these constant into SIMD register.

Moreover, the plane U/V also apply same algorithm to get improve.

Regards,
Min Chen

At 2021-09-30 09:56:11, "Wu Jianhua" <jianhua.wu at intel.com> wrote:
>With the accelerating by means of AVX2, the uyvytoyuv422 can be faster
>
>Performance data(Less is better):
>    uyvytoyuv422_sse2    0.50388
>    uyvytoyuv422_avx     0.46132
>    uyvytoyuv422_avx2    0.27309
>
>Signed-off-by: Wu Jianhua <jianhua.wu at intel.com>
>---
> libswscale/x86/rgb2rgb.c     |  6 ++++
> libswscale/x86/rgb_2_rgb.asm | 60 ++++++++++++++++++++++++++++--------
> 2 files changed, 53 insertions(+), 13 deletions(-)
>
>diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c
>index c9ff33ab77..a965a1755c 100644
>--- a/libswscale/x86/rgb2rgb.c
>+++ b/libswscale/x86/rgb2rgb.c
>@@ -164,6 +164,9 @@ void ff_uyvytoyuv422_sse2(uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
> void ff_uyvytoyuv422_avx(uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
>                          const uint8_t *src, int width, int height,
>                          int lumStride, int chromStride, int srcStride);
>+void ff_uyvytoyuv422_avx2(uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
>+                          const uint8_t *src, int width, int height,
>+                          int lumStride, int chromStride, int srcStride);
> #endif
> 

>
>_______________________________________________
>ffmpeg-devel mailing list
>ffmpeg-devel at ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".