[FFmpeg-devel] [PATCH v2 3/4] libswscale/x86/rgb2rgb: add uyvytoyuv422 avx2

Wu, Jianhua jianhua.wu at intel.com
Thu Sep 30 11:01:50 EEST 2021


Min Chen wrote:
> At 2021-09-30 15:23:08, "Wu, Jianhua" <jianhua.wu at intel.com> wrote:
> >Min Chen wrote:
> >> Sent: Thursday, September 30, 2021 10:29 AM
> >> To: FFmpeg development discussions and patches <ffmpeg-
> >> devel at ffmpeg.org>
> >> Subject: Re: [FFmpeg-devel] [PATCH v2 3/4] libswscale/x86/rgb2rgb:
> >> add
> >> uyvytoyuv422 avx2
> >>
> >> Hello,
> >>
> >> >+pb_shuffle_low: times 4 db 1, 3, 5, 7, 9, 11, 13, 15, -1, -1, -1,
> >> >+-1, -1, -1, -1, -1
> >> Why we times 4?
> >> AVX2 provided instruction VPBROADCASTQ to load these constant into
> >> SIMD register.
> >>
> >> Moreover, the plane U/V also apply same algorithm to get improve.
> >>
> >> Regards,
> >> Min Chen
> >>
> >Hi Min Chen,
> >
> >Much appreciated your helpful suggestions.
> >
> >Correct! It's not necessary to use time 4 here.  It's funny that I did
> >try to avoid using it here when writing the codes and get no way because I
> ignored the VBROADCASTI128 instruction.
> >
> >About the UV extracting, I have estimated the new method before making
> >a decision to keep using the masterpiece of the previous author. The
> >former is better, and pand instruction has a better reciprocal throughput, or
> issue latency.
> >
> >Best regards,
> >Jianhua
> 
> 
> 
> For VBROADCASTI128, we don't care high part of result, so we just need
> lowest 64-bits constant table. VPBROADCASTQ enough.
> 
> 

Definitely make sense. Thanks.




More information about the ffmpeg-devel mailing list