[FFmpeg-devel] [PATCH v4 0/8] swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats
Ramiro Polla
ramiro.polla at gmail.com
Sun Dec 1 20:20:02 EET 2024
changes from v3:
- removed left-over FFMIN() on input in lumRangeToJpeg16_c();
- restored cast to signed int before right shift that was mistakenly removed in chrRangeToJpeg16_c();
- restored disabling of aarch64 simd functions after changing to new API;
- add test for negative input values
- fixed {lum,chr}ConvertRange16 for negative input (dropped sse2 implementation since it does not have pmuldq);
- reordered commits;
- reran all benchmarks;
checkasm --bench for entire patchset:
x86_64:
chrRangeFromJpeg8_1920_c: 2126.5 2114.7 (1.01x)
chrRangeFromJpeg8_1920_sse2: 817.0 814.2 (1.00x)
chrRangeFromJpeg8_1920_avx2: 404.4 405.5 (1.00x)
chrRangeFromJpeg16_1920_c: 2331.4 3153.9 (0.74x)
chrRangeToJpeg8_1920_c: 3163.0 3163.9 (1.00x)
chrRangeToJpeg8_1920_sse2: 814.5 814.8 (1.00x)
chrRangeToJpeg8_1920_avx2: 404.4 405.7 (1.00x)
chrRangeToJpeg16_1920_c: 3163.7 3165.0 (1.00x)
lumRangeFromJpeg8_1920_c: 1262.2 1306.8 (0.97x)
lumRangeFromJpeg8_1920_sse2: 411.9 414.4 (0.99x)
lumRangeFromJpeg8_1920_avx2: 206.9 206.0 (1.00x)
lumRangeFromJpeg16_1920_c: 1079.5 1298.5 (0.83x)
lumRangeToJpeg8_1920_c: 1860.5 1906.0 (0.98x)
lumRangeToJpeg8_1920_sse2: 411.9 412.9 (1.00x)
lumRangeToJpeg8_1920_avx2: 198.9 205.9 (0.97x)
lumRangeToJpeg16_1920_c: 1910.2 1905.0 (1.00x)
aarch64 A55:
chrRangeFromJpeg8_1920_c: 28836.2 28836.8 (1.00x)
chrRangeFromJpeg8_1920_neon: 5312.6 5310.2 (1.00x)
chrRangeFromJpeg16_1920_c: 28840.1 32684.2 (0.88x)
chrRangeToJpeg8_1920_c: 44196.2 23073.2 (1.92x)
chrRangeToJpeg8_1920_neon: 6034.6 5547.4 (1.09x)
chrRangeToJpeg16_1920_c: 36527.3 24996.8 (1.46x)
lumRangeFromJpeg8_1920_c: 15388.5 15383.5 (1.00x)
lumRangeFromJpeg8_1920_neon: 3150.7 3147.4 (1.00x)
lumRangeFromJpeg16_1920_c: 15389.3 17305.2 (0.89x)
lumRangeToJpeg8_1920_c: 23069.7 19226.2 (1.20x)
lumRangeToJpeg8_1920_neon: 3873.2 3627.8 (1.07x)
lumRangeToJpeg16_1920_c: 19227.8 21144.8 (0.91x)
aarch64 A76:
chrRangeFromJpeg8_1920_c: 6334.7 6263.8 (1.01x)
chrRangeFromJpeg8_1920_neon: 2264.5 2307.0 (0.98x)
chrRangeFromJpeg16_1920_c: 6336.0 11523.8 (0.55x)
chrRangeToJpeg8_1920_c: 11474.5 9610.4 (1.19x)
chrRangeToJpeg8_1920_neon: 2646.5 2794.2 (0.95x)
chrRangeToJpeg16_1920_c: 9640.5 11655.2 (0.83x)
lumRangeFromJpeg8_1920_c: 4453.2 4420.8 (1.01x)
lumRangeFromJpeg8_1920_neon: 1104.8 1107.0 (1.00x)
lumRangeFromJpeg16_1920_c: 4414.2 5762.0 (0.77x)
lumRangeToJpeg8_1920_c: 6645.0 5980.8 (1.11x)
lumRangeToJpeg8_1920_neon: 1310.5 1334.0 (0.98x)
lumRangeToJpeg16_1920_c: 6005.2 5946.2 (1.01x)
Ramiro Polla (8):
checkasm/sw_range_convert: test negative input values
swscale/range_convert: saturate output instead of limiting input
swscale/aarch64/range_convert: saturate output instead of limiting
input
swscale/range_convert: fix mpeg ranges in yuv range conversion for
non-8-bit pixel formats
swscale/x86/range_convert: update sse2 and avx2 range_convert
functions to new API
swscale/aarch64/range_convert: update neon range_convert functions to
new API
swscale/x86: add sse4 and avx2 {lum,chr}ConvertRange16
swscale/aarch64: add neon {lum,chr}ConvertRange16
libswscale/aarch64/range_convert_neon.S | 152 ++++++++++----
libswscale/aarch64/swscale.c | 36 +++-
libswscale/hscale.c | 6 +-
libswscale/loongarch/swscale_init_loongarch.c | 5 +
libswscale/riscv/swscale.c | 5 +
libswscale/swscale.c | 122 ++++++++++--
libswscale/swscale_internal.h | 26 ++-
libswscale/x86/range_convert.asm | 159 ++++++++++-----
libswscale/x86/swscale.c | 50 +++--
tests/checkasm/sw_range_convert.c | 82 +++++++-
.../fate/filter-alphaextract_alphamerge_rgb | 100 +++++-----
tests/ref/fate/filter-pixdesc-gray10be | 2 +-
tests/ref/fate/filter-pixdesc-gray10le | 2 +-
tests/ref/fate/filter-pixdesc-gray12be | 2 +-
tests/ref/fate/filter-pixdesc-gray12le | 2 +-
tests/ref/fate/filter-pixdesc-gray14be | 2 +-
tests/ref/fate/filter-pixdesc-gray14le | 2 +-
tests/ref/fate/filter-pixdesc-gray16be | 2 +-
tests/ref/fate/filter-pixdesc-gray16le | 2 +-
tests/ref/fate/filter-pixdesc-gray9be | 2 +-
tests/ref/fate/filter-pixdesc-gray9le | 2 +-
tests/ref/fate/filter-pixdesc-ya16be | 2 +-
tests/ref/fate/filter-pixdesc-ya16le | 2 +-
tests/ref/fate/filter-pixdesc-yuvj411p | 2 +-
tests/ref/fate/filter-pixdesc-yuvj420p | 2 +-
tests/ref/fate/filter-pixdesc-yuvj422p | 2 +-
tests/ref/fate/filter-pixdesc-yuvj440p | 2 +-
tests/ref/fate/filter-pixdesc-yuvj444p | 2 +-
tests/ref/fate/filter-pixfmts-copy | 34 ++--
tests/ref/fate/filter-pixfmts-crop | 34 ++--
tests/ref/fate/filter-pixfmts-field | 34 ++--
tests/ref/fate/filter-pixfmts-fieldorder | 30 +--
tests/ref/fate/filter-pixfmts-hflip | 34 ++--
tests/ref/fate/filter-pixfmts-il | 34 ++--
tests/ref/fate/filter-pixfmts-lut | 18 +-
tests/ref/fate/filter-pixfmts-null | 34 ++--
tests/ref/fate/filter-pixfmts-pad | 22 +--
tests/ref/fate/filter-pixfmts-pullup | 10 +-
tests/ref/fate/filter-pixfmts-rotate | 4 +-
tests/ref/fate/filter-pixfmts-scale | 34 ++--
tests/ref/fate/filter-pixfmts-swapuv | 10 +-
.../ref/fate/filter-pixfmts-tinterlace_cvlpf | 8 +-
.../ref/fate/filter-pixfmts-tinterlace_merge | 8 +-
tests/ref/fate/filter-pixfmts-tinterlace_pad | 8 +-
tests/ref/fate/filter-pixfmts-tinterlace_vlpf | 8 +-
tests/ref/fate/filter-pixfmts-transpose | 28 +--
tests/ref/fate/filter-pixfmts-vflip | 34 ++--
tests/ref/fate/fitsenc-gray | 2 +-
tests/ref/fate/fitsenc-gray16be | 10 +-
tests/ref/fate/gifenc-gray | 186 +++++++++---------
tests/ref/fate/idroq-video-encode | 2 +-
tests/ref/fate/jpg-icc | 8 +-
tests/ref/fate/sws-yuv-colorspace | 2 +-
tests/ref/fate/sws-yuv-range | 2 +-
tests/ref/fate/vvc-conformance-SCALING_A_1 | 128 ++++++------
tests/ref/lavf/gray16be.fits | 4 +-
tests/ref/lavf/gray16be.pam | 4 +-
tests/ref/lavf/gray16be.png | 6 +-
tests/ref/lavf/jpg | 6 +-
tests/ref/lavf/smjpeg | 6 +-
tests/ref/pixfmt/gbrp-gray | 2 +-
tests/ref/pixfmt/gbrp-gray10be | 2 +-
tests/ref/pixfmt/gbrp-gray10le | 2 +-
tests/ref/pixfmt/gbrp-gray12be | 2 +-
tests/ref/pixfmt/gbrp-gray12le | 2 +-
tests/ref/pixfmt/gbrp-gray16be | 2 +-
tests/ref/pixfmt/gbrp-gray16le | 2 +-
tests/ref/pixfmt/gbrp-yuvj420p | 2 +-
tests/ref/pixfmt/gbrp-yuvj422p | 2 +-
tests/ref/pixfmt/gbrp-yuvj440p | 2 +-
tests/ref/pixfmt/gbrp-yuvj444p | 2 +-
tests/ref/pixfmt/gbrp10-gray | 2 +-
tests/ref/pixfmt/gbrp10-gray10be | 2 +-
tests/ref/pixfmt/gbrp10-gray10le | 2 +-
tests/ref/pixfmt/gbrp10-gray12be | 2 +-
tests/ref/pixfmt/gbrp10-gray12le | 2 +-
tests/ref/pixfmt/gbrp10-gray16be | 2 +-
tests/ref/pixfmt/gbrp10-gray16le | 2 +-
tests/ref/pixfmt/gbrp10-yuvj420p | 2 +-
tests/ref/pixfmt/gbrp10-yuvj422p | 2 +-
tests/ref/pixfmt/gbrp10-yuvj440p | 2 +-
tests/ref/pixfmt/gbrp10-yuvj444p | 2 +-
tests/ref/pixfmt/gbrp12-gray | 2 +-
tests/ref/pixfmt/gbrp12-gray10be | 2 +-
tests/ref/pixfmt/gbrp12-gray10le | 2 +-
tests/ref/pixfmt/gbrp12-gray12be | 2 +-
tests/ref/pixfmt/gbrp12-gray12le | 2 +-
tests/ref/pixfmt/gbrp12-gray16be | 2 +-
tests/ref/pixfmt/gbrp12-gray16le | 2 +-
tests/ref/pixfmt/gbrp12-yuvj420p | 2 +-
tests/ref/pixfmt/gbrp12-yuvj422p | 2 +-
tests/ref/pixfmt/gbrp12-yuvj440p | 2 +-
tests/ref/pixfmt/gbrp12-yuvj444p | 2 +-
tests/ref/pixfmt/gbrp16-gray16be | 2 +-
tests/ref/pixfmt/gbrp16-gray16le | 2 +-
tests/ref/pixfmt/rgb24-gray | 2 +-
tests/ref/pixfmt/rgb24-gray10be | 2 +-
tests/ref/pixfmt/rgb24-gray10le | 2 +-
tests/ref/pixfmt/rgb24-gray12be | 2 +-
tests/ref/pixfmt/rgb24-gray12le | 2 +-
tests/ref/pixfmt/rgb24-gray16be | 2 +-
tests/ref/pixfmt/rgb24-gray16le | 2 +-
tests/ref/pixfmt/rgb24-yuvj420p | 2 +-
tests/ref/pixfmt/rgb24-yuvj422p | 2 +-
tests/ref/pixfmt/rgb24-yuvj440p | 2 +-
tests/ref/pixfmt/rgb24-yuvj444p | 2 +-
tests/ref/pixfmt/rgb48-gray | 2 +-
tests/ref/pixfmt/rgb48-gray10be | 2 +-
tests/ref/pixfmt/rgb48-gray10le | 2 +-
tests/ref/pixfmt/rgb48-gray12be | 2 +-
tests/ref/pixfmt/rgb48-gray12le | 2 +-
tests/ref/pixfmt/rgb48-gray16be | 2 +-
tests/ref/pixfmt/rgb48-gray16le | 2 +-
tests/ref/pixfmt/rgb48-yuvj420p | 2 +-
tests/ref/pixfmt/rgb48-yuvj422p | 2 +-
tests/ref/pixfmt/rgb48-yuvj440p | 2 +-
tests/ref/pixfmt/rgb48-yuvj444p | 2 +-
tests/ref/pixfmt/yuv444p-gray10be | 2 +-
tests/ref/pixfmt/yuv444p-gray10le | 2 +-
tests/ref/pixfmt/yuv444p-gray12be | 2 +-
tests/ref/pixfmt/yuv444p-gray12le | 2 +-
tests/ref/pixfmt/yuv444p-gray16be | 2 +-
tests/ref/pixfmt/yuv444p-gray16le | 2 +-
tests/ref/pixfmt/yuv444p-yuvj420p | 2 +-
tests/ref/pixfmt/yuv444p-yuvj422p | 2 +-
tests/ref/pixfmt/yuv444p-yuvj440p | 2 +-
tests/ref/pixfmt/yuv444p10-gray | 2 +-
tests/ref/pixfmt/yuv444p10-gray10be | 2 +-
tests/ref/pixfmt/yuv444p10-gray10le | 2 +-
tests/ref/pixfmt/yuv444p10-gray12be | 2 +-
tests/ref/pixfmt/yuv444p10-gray12le | 2 +-
tests/ref/pixfmt/yuv444p10-gray16be | 2 +-
tests/ref/pixfmt/yuv444p10-gray16le | 2 +-
tests/ref/pixfmt/yuv444p10-yuvj420p | 2 +-
tests/ref/pixfmt/yuv444p10-yuvj422p | 2 +-
tests/ref/pixfmt/yuv444p10-yuvj440p | 2 +-
tests/ref/pixfmt/yuv444p10-yuvj444p | 2 +-
tests/ref/pixfmt/yuv444p12-gray | 2 +-
tests/ref/pixfmt/yuv444p12-gray10be | 2 +-
tests/ref/pixfmt/yuv444p12-gray10le | 2 +-
tests/ref/pixfmt/yuv444p12-gray12be | 2 +-
tests/ref/pixfmt/yuv444p12-gray12le | 2 +-
tests/ref/pixfmt/yuv444p12-gray16be | 2 +-
tests/ref/pixfmt/yuv444p12-gray16le | 2 +-
tests/ref/pixfmt/yuv444p12-yuvj420p | 2 +-
tests/ref/pixfmt/yuv444p12-yuvj422p | 2 +-
tests/ref/pixfmt/yuv444p12-yuvj440p | 2 +-
tests/ref/pixfmt/yuv444p12-yuvj444p | 2 +-
tests/ref/pixfmt/yuv444p16-gray16be | 2 +-
tests/ref/pixfmt/yuv444p16-gray16le | 2 +-
tests/ref/pixfmt/yuvj420p | 2 +-
tests/ref/pixfmt/yuvj422p | 2 +-
tests/ref/pixfmt/yuvj440p | 2 +-
tests/ref/pixfmt/yuvj444p | 2 +-
tests/ref/seek/lavf-jpg | 8 +-
tests/ref/seek/vsynth_lena-mjpeg | 40 ++--
tests/ref/seek/vsynth_lena-roqvideo | 2 +-
tests/ref/vsynth/vsynth1-amv | 8 +-
tests/ref/vsynth/vsynth1-mjpeg | 6 +-
tests/ref/vsynth/vsynth1-mjpeg-422 | 6 +-
tests/ref/vsynth/vsynth1-mjpeg-444 | 6 +-
tests/ref/vsynth/vsynth1-mjpeg-huffman | 6 +-
tests/ref/vsynth/vsynth1-mjpeg-trell | 8 +-
tests/ref/vsynth/vsynth1-mjpeg-trell-huffman | 8 +-
tests/ref/vsynth/vsynth1-roqvideo | 8 +-
tests/ref/vsynth/vsynth2-amv | 6 +-
tests/ref/vsynth/vsynth2-mjpeg | 6 +-
tests/ref/vsynth/vsynth2-mjpeg-422 | 6 +-
tests/ref/vsynth/vsynth2-mjpeg-444 | 6 +-
tests/ref/vsynth/vsynth2-mjpeg-huffman | 6 +-
tests/ref/vsynth/vsynth2-mjpeg-trell | 8 +-
tests/ref/vsynth/vsynth2-mjpeg-trell-huffman | 8 +-
tests/ref/vsynth/vsynth2-roqvideo | 8 +-
tests/ref/vsynth/vsynth3-amv | 8 +-
tests/ref/vsynth/vsynth3-mjpeg | 8 +-
tests/ref/vsynth/vsynth3-mjpeg-422 | 8 +-
tests/ref/vsynth/vsynth3-mjpeg-444 | 6 +-
tests/ref/vsynth/vsynth3-mjpeg-huffman | 8 +-
tests/ref/vsynth/vsynth3-mjpeg-trell | 6 +-
tests/ref/vsynth/vsynth3-mjpeg-trell-huffman | 6 +-
tests/ref/vsynth/vsynth_lena-amv | 6 +-
tests/ref/vsynth/vsynth_lena-mjpeg | 8 +-
tests/ref/vsynth/vsynth_lena-mjpeg-422 | 6 +-
tests/ref/vsynth/vsynth_lena-mjpeg-444 | 6 +-
tests/ref/vsynth/vsynth_lena-mjpeg-huffman | 8 +-
tests/ref/vsynth/vsynth_lena-mjpeg-trell | 8 +-
.../vsynth/vsynth_lena-mjpeg-trell-huffman | 8 +-
tests/ref/vsynth/vsynth_lena-roqvideo | 8 +-
188 files changed, 1189 insertions(+), 836 deletions(-)
--
2.39.5
More information about the ffmpeg-devel
mailing list