[FFmpeg-devel] [PATCH v2 3/4] libswscale/x86/rgb2rgb: add uyvytoyuv422 avx2
chen
chenm003 at 163.com
Thu Sep 30 05:29:18 EEST 2021
Hello,
>+pb_shuffle_low: times 4 db 1, 3, 5, 7, 9, 11, 13, 15, -1, -1, -1, -1, -1, -1, -1, -1
Why we times 4?
AVX2 provided instruction VPBROADCASTQ to load these constant into SIMD register.
Moreover, the plane U/V also apply same algorithm to get improve.
Regards,
Min Chen
At 2021-09-30 09:56:11, "Wu Jianhua" <jianhua.wu at intel.com> wrote:
>With the accelerating by means of AVX2, the uyvytoyuv422 can be faster
>
>Performance data(Less is better):
> uyvytoyuv422_sse2 0.50388
> uyvytoyuv422_avx 0.46132
> uyvytoyuv422_avx2 0.27309
>
>Signed-off-by: Wu Jianhua <jianhua.wu at intel.com>
>---
> libswscale/x86/rgb2rgb.c | 6 ++++
> libswscale/x86/rgb_2_rgb.asm | 60 ++++++++++++++++++++++++++++--------
> 2 files changed, 53 insertions(+), 13 deletions(-)
>
>diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c
>index c9ff33ab77..a965a1755c 100644
>--- a/libswscale/x86/rgb2rgb.c
>+++ b/libswscale/x86/rgb2rgb.c
>@@ -164,6 +164,9 @@ void ff_uyvytoyuv422_sse2(uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
> void ff_uyvytoyuv422_avx(uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
> const uint8_t *src, int width, int height,
> int lumStride, int chromStride, int srcStride);
>+void ff_uyvytoyuv422_avx2(uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
>+ const uint8_t *src, int width, int height,
>+ int lumStride, int chromStride, int srcStride);
> #endif
>
>
>_______________________________________________
>ffmpeg-devel mailing list
>ffmpeg-devel at ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list