[FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes
James Almer
jamrial at gmail.com
Sat Jan 25 17:39:18 EET 2025
On 1/25/2025 12:11 PM, Shreesh Adiga wrote:
>> Thanks for the patch. Could you please compile and run
>> tests/checkasm/checkasm with "--test=sw_rgb --bench" and paste the
>> results for the shuffle_bytes functions, to see if there's a speed up
>> compared to the AVX2 implementation?
>
> I ran the command "tests/checkasm/checkasm --test=sw_rgb --bench" and I see
> the below output:
> benchmarking with native FFmpeg timers
> nop: 45.0
> checkasm: using random seed 17575157
> checkasm: bench runs 1024 (1 << 10)
> SSE2:
> - sw_rgb.uyvytoyuv422 [OK]
> - sw_rgb.interleave_bytes [OK]
> - sw_rgb.deinterleave_bytes [OK]
> - sw_rgb.rgb_to_y [OK]
> - sw_rgb.rgb_to_uv [OK]
> SSSE3:
> - sw_rgb.shuffle_bytes_2103 [OK]
> - sw_rgb.shuffle_bytes_0321 [OK]
> - sw_rgb.shuffle_bytes_1230 [OK]
> - sw_rgb.shuffle_bytes_3012 [OK]
> - sw_rgb.shuffle_bytes_3210 [OK]
> - sw_rgb.rgb_to_y [OK]
> - sw_rgb.rgb_to_uv [OK]
> AVX:
> - sw_rgb.uyvytoyuv422 [OK]
> - sw_rgb.deinterleave_bytes [OK]
> - sw_rgb.rgb_to_y [OK]
> - sw_rgb.rgb_to_uv [OK]
> AVX2:
> - sw_rgb.shuffle_bytes_2103 [OK]
> - sw_rgb.shuffle_bytes_0321 [OK]
> - sw_rgb.shuffle_bytes_1230 [OK]
> - sw_rgb.shuffle_bytes_3012 [OK]
> - sw_rgb.shuffle_bytes_3210 [OK]
> - sw_rgb.uyvytoyuv422 [OK]
> - sw_rgb.rgb_to_y [OK]
> - sw_rgb.rgb_to_uv [OK]
> AVX-512ICL:
> - sw_rgb.shuffle_bytes_2103 [OK]
> - sw_rgb.shuffle_bytes_0321 [OK]
> - sw_rgb.shuffle_bytes_1230 [OK]
> - sw_rgb.shuffle_bytes_3012 [OK]
> - sw_rgb.shuffle_bytes_3210 [OK]
> checkasm: all 184 tests passed
> shuffle_bytes_0321_c: 45.0 ( 1.00x)
> shuffle_bytes_0321_ssse3: 11.2 ( 4.00x)
> shuffle_bytes_0321_avx2: 11.2 ( 4.00x)
> shuffle_bytes_0321_avx512icl: 11.2 ( 4.00x)
> shuffle_bytes_1230_c: 67.5 ( 1.00x)
> shuffle_bytes_1230_ssse3: 11.2 ( 6.00x)
> shuffle_bytes_1230_avx2: 11.2 ( 6.00x)
> shuffle_bytes_1230_avx512icl: 0.0 ( 0.00x)
> shuffle_bytes_2103_c: 45.0 ( 1.00x)
> shuffle_bytes_2103_ssse3: 11.2 ( 4.00x)
> shuffle_bytes_2103_avx2: 0.0 ( 0.00x)
> shuffle_bytes_2103_avx512icl: 0.0 ( 0.00x)
> shuffle_bytes_3012_c: 67.5 ( 1.00x)
> shuffle_bytes_3012_ssse3: 11.2 ( 6.00x)
> shuffle_bytes_3012_avx2: 11.2 ( 6.00x)
> shuffle_bytes_3012_avx512icl: 0.0 ( 0.00x)
> shuffle_bytes_3210_c: 67.5 ( 1.00x)
> shuffle_bytes_3210_ssse3: 11.2 ( 6.00x)
> shuffle_bytes_3210_avx2: 11.2 ( 6.00x)
> shuffle_bytes_3210_avx512icl: 0.0 ( 0.00x)
>
> I've not included the other function printed by the bench command.
> I'm not sure if I'm missing something, the output doesn't look consistent
> to me.
> There are many 0.0 and I don't see any difference between ssse3 and avx2
> either.
> I'm running this on AMD Ryzen 7950x Zen4 machine.
>
> I've inspected the assembly output for one of the ssse3/avx2/avx512 and it
> seems to be as per my expectation.
> Therefore I'm not sure if the checkasm is accurately measuring here.
> Please let me know if I'm missing something here, I'm new to FFmpeg
> development and this is my first patch submission.
Try running it several times using the same seed, so
"tests/checkasm/checkasm --test=sw_rgb --bench 17575157", and make sure
no power saving feature is enabled (so the CPU frequency doesn't change
based on load). That may help getting consistent results.
On my Intel Core i7 12700K for example i get
shuffle_bytes_0321_c: 27.8 ( 1.00x)
shuffle_bytes_0321_ssse3: 8.3 ( 3.35x)
shuffle_bytes_0321_avx2: 6.3 ( 4.41x)
shuffle_bytes_1230_c: 51.8 ( 1.00x)
shuffle_bytes_1230_ssse3: 8.3 ( 6.24x)
shuffle_bytes_1230_avx2: 6.3 ( 8.22x)
shuffle_bytes_2103_c: 28.8 ( 1.00x)
shuffle_bytes_2103_ssse3: 8.3 ( 3.47x)
shuffle_bytes_2103_avx2: 6.3 ( 4.57x)
shuffle_bytes_3012_c: 52.8 ( 1.00x)
shuffle_bytes_3012_ssse3: 8.3 ( 6.36x)
shuffle_bytes_3012_avx2: 6.3 ( 8.38x)
shuffle_bytes_3210_c: 51.8 ( 1.00x)
shuffle_bytes_3210_ssse3: 8.3 ( 6.24x)
shuffle_bytes_3210_avx2: 5.8 ( 8.93x)
Otherwise, maybe someone else with an AVX512ICL enabled CPU can test it
to confirm.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20250125/7f06c99e/attachment.sig>
More information about the ffmpeg-devel
mailing list