[FFmpeg-devel] [FFFjo] [FFmpeg/FFmpeg] swscale: Implement neon assembly for yuv2nv12cx and yuv2planeX_10 (PR #20028)

dashsantosh-mcw code at ffmpeg.org
Thu Jul 24 10:13:23 EEST 2025


Checkasm Benchmark Results

yuv2nv12cX_2_512_accurate_c:                              3496.2 ( 1.00x)
yuv2nv12cX_2_512_accurate_neon:                            409.5 ( 8.54x)
yuv2nv12cX_2_512_approximate_c:                           3495.1 ( 1.00x)
yuv2nv12cX_2_512_approximate_neon:                         409.4 ( 8.54x)
yuv2nv12cX_4_512_accurate_c:                              4676.5 ( 1.00x)
yuv2nv12cX_4_512_accurate_neon:                            613.1 ( 7.63x)
yuv2nv12cX_4_512_approximate_c:                           4677.8 ( 1.00x)
yuv2nv12cX_4_512_approximate_neon:                         607.8 ( 7.70x)
yuv2nv12cX_8_512_accurate_c:                              7221.6 ( 1.00x)
yuv2nv12cX_8_512_accurate_neon:                           1003.8 ( 7.19x)
yuv2nv12cX_8_512_approximate_c:                           7221.2 ( 1.00x)
yuv2nv12cX_8_512_approximate_neon:                        1016.4 ( 7.11x)
yuv2nv12cX_16_512_accurate_c:                            13731.1 ( 1.00x)
yuv2nv12cX_16_512_accurate_neon:                          1757.2 ( 7.81x)
yuv2nv12cX_16_512_approximate_c:                         13740.7 ( 1.00x)
yuv2nv12cX_16_512_approximate_neon:                       1757.3 ( 7.82x)
yuv2yuvX_10_LE_16_0_512_accurate_c:                   7836.9 ( 1.00x)
yuv2yuvX_10_LE_16_0_512_accurate_neon:                 840.4 ( 9.33x)
yuv2yuvX_10_LE_16_0_512_approximate_c:                7930.8 ( 1.00x)
yuv2yuvX_10_LE_16_0_512_approximate_neon:              838.5 ( 9.46x)
yuv2yuvX_10_LE_16_16_512_accurate_c:                  7594.3 ( 1.00x)
yuv2yuvX_10_LE_16_16_512_accurate_neon:                815.2 ( 9.32x)
yuv2yuvX_10_LE_16_16_512_approximate_c:               7687.0 ( 1.00x)
yuv2yuvX_10_LE_16_16_512_approximate_neon:             811.9 ( 9.47x)
yuv2yuvX_10_LE_16_32_512_accurate_c:                  7366.4 ( 1.00x)
yuv2yuvX_10_LE_16_32_512_accurate_neon:                785.8 ( 9.37x)
yuv2yuvX_10_LE_16_32_512_approximate_c:               7426.5 ( 1.00x)
yuv2yuvX_10_LE_16_32_512_approximate_neon:             786.4 ( 9.44x)
yuv2yuvX_10_LE_16_48_512_accurate_c:                  7123.1 ( 1.00x)
yuv2yuvX_10_LE_16_48_512_accurate_neon:                761.7 ( 9.35x)
yuv2yuvX_10_LE_16_48_512_approximate_c:               7182.7 ( 1.00x)
yuv2yuvX_10_LE_16_48_512_approximate_neon:             763.0 ( 9.41x)
yuv2yuvX_10_BE_16_0_512_accurate_c:                   8092.6 ( 1.00x)
yuv2yuvX_10_BE_16_0_512_accurate_neon:                 860.2 ( 9.41x)
yuv2yuvX_10_BE_16_0_512_approximate_c:                8183.5 ( 1.00x)
yuv2yuvX_10_BE_16_0_512_approximate_neon:              861.4 ( 9.50x)
yuv2yuvX_10_BE_16_16_512_accurate_c:                  7837.4 ( 1.00x)
yuv2yuvX_10_BE_16_16_512_accurate_neon:                834.0 ( 9.40x)
yuv2yuvX_10_BE_16_16_512_approximate_c:               7927.9 ( 1.00x)
yuv2yuvX_10_BE_16_16_512_approximate_neon:             834.6 ( 9.50x)
yuv2yuvX_10_BE_16_32_512_accurate_c:                  7605.1 ( 1.00x)
yuv2yuvX_10_BE_16_32_512_accurate_neon:                807.5 ( 9.42x)
yuv2yuvX_10_BE_16_32_512_approximate_c:               7691.4 ( 1.00x)
yuv2yuvX_10_BE_16_32_512_approximate_neon:             807.3 ( 9.53x)
yuv2yuvX_10_BE_16_48_512_accurate_c:                  7344.3 ( 1.00x)
yuv2yuvX_10_BE_16_48_512_accurate_neon:                782.7 ( 9.38x)
yuv2yuvX_10_BE_16_48_512_approximate_c:               7440.1 ( 1.00x)
yuv2yuvX_10_BE_16_48_512_approximate_neon:             781.9 ( 9.51x)

---
View it on FFmpeg Forgejo ( https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20028 ) or reply to this email directly.


More information about the ffmpeg-devel mailing list