[FFmpeg-devel] [PATCH v4 0/8] swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats

Ramiro Polla ramiro.polla at gmail.com
Sun Dec 1 20:20:02 EET 2024


changes from v3:
- removed left-over FFMIN() on input in lumRangeToJpeg16_c();
- restored cast to signed int before right shift that was mistakenly removed in chrRangeToJpeg16_c();
- restored disabling of aarch64 simd functions after changing to new API;
- add test for negative input values
- fixed {lum,chr}ConvertRange16 for negative input (dropped sse2 implementation since it does not have pmuldq);
- reordered commits;
- reran all benchmarks;

checkasm --bench for entire patchset:

x86_64:
chrRangeFromJpeg8_1920_c:      2126.5   2114.7  (1.01x)
chrRangeFromJpeg8_1920_sse2:    817.0    814.2  (1.00x)
chrRangeFromJpeg8_1920_avx2:    404.4    405.5  (1.00x)
chrRangeFromJpeg16_1920_c:     2331.4   3153.9  (0.74x)
chrRangeToJpeg8_1920_c:        3163.0   3163.9  (1.00x)
chrRangeToJpeg8_1920_sse2:      814.5    814.8  (1.00x)
chrRangeToJpeg8_1920_avx2:      404.4    405.7  (1.00x)
chrRangeToJpeg16_1920_c:       3163.7   3165.0  (1.00x)
lumRangeFromJpeg8_1920_c:      1262.2   1306.8  (0.97x)
lumRangeFromJpeg8_1920_sse2:    411.9    414.4  (0.99x)
lumRangeFromJpeg8_1920_avx2:    206.9    206.0  (1.00x)
lumRangeFromJpeg16_1920_c:     1079.5   1298.5  (0.83x)
lumRangeToJpeg8_1920_c:        1860.5   1906.0  (0.98x)
lumRangeToJpeg8_1920_sse2:      411.9    412.9  (1.00x)
lumRangeToJpeg8_1920_avx2:      198.9    205.9  (0.97x)
lumRangeToJpeg16_1920_c:       1910.2   1905.0  (1.00x)

aarch64 A55:
chrRangeFromJpeg8_1920_c:     28836.2  28836.8  (1.00x)
chrRangeFromJpeg8_1920_neon:   5312.6   5310.2  (1.00x)
chrRangeFromJpeg16_1920_c:    28840.1  32684.2  (0.88x)
chrRangeToJpeg8_1920_c:       44196.2  23073.2  (1.92x)
chrRangeToJpeg8_1920_neon:     6034.6   5547.4  (1.09x)
chrRangeToJpeg16_1920_c:      36527.3  24996.8  (1.46x)
lumRangeFromJpeg8_1920_c:     15388.5  15383.5  (1.00x)
lumRangeFromJpeg8_1920_neon:   3150.7   3147.4  (1.00x)
lumRangeFromJpeg16_1920_c:    15389.3  17305.2  (0.89x)
lumRangeToJpeg8_1920_c:       23069.7  19226.2  (1.20x)
lumRangeToJpeg8_1920_neon:     3873.2   3627.8  (1.07x)
lumRangeToJpeg16_1920_c:      19227.8  21144.8  (0.91x)

aarch64 A76:
chrRangeFromJpeg8_1920_c:      6334.7   6263.8  (1.01x)
chrRangeFromJpeg8_1920_neon:   2264.5   2307.0  (0.98x)
chrRangeFromJpeg16_1920_c:     6336.0  11523.8  (0.55x)
chrRangeToJpeg8_1920_c:       11474.5   9610.4  (1.19x)
chrRangeToJpeg8_1920_neon:     2646.5   2794.2  (0.95x)
chrRangeToJpeg16_1920_c:       9640.5  11655.2  (0.83x)
lumRangeFromJpeg8_1920_c:      4453.2   4420.8  (1.01x)
lumRangeFromJpeg8_1920_neon:   1104.8   1107.0  (1.00x)
lumRangeFromJpeg16_1920_c:     4414.2   5762.0  (0.77x)
lumRangeToJpeg8_1920_c:        6645.0   5980.8  (1.11x)
lumRangeToJpeg8_1920_neon:     1310.5   1334.0  (0.98x)
lumRangeToJpeg16_1920_c:       6005.2   5946.2  (1.01x)

Ramiro Polla (8):
  checkasm/sw_range_convert: test negative input values
  swscale/range_convert: saturate output instead of limiting input
  swscale/aarch64/range_convert: saturate output instead of limiting
    input
  swscale/range_convert: fix mpeg ranges in yuv range conversion for
    non-8-bit pixel formats
  swscale/x86/range_convert: update sse2 and avx2 range_convert
    functions to new API
  swscale/aarch64/range_convert: update neon range_convert functions to
    new API
  swscale/x86: add sse4 and avx2 {lum,chr}ConvertRange16
  swscale/aarch64: add neon {lum,chr}ConvertRange16

 libswscale/aarch64/range_convert_neon.S       | 152 ++++++++++----
 libswscale/aarch64/swscale.c                  |  36 +++-
 libswscale/hscale.c                           |   6 +-
 libswscale/loongarch/swscale_init_loongarch.c |   5 +
 libswscale/riscv/swscale.c                    |   5 +
 libswscale/swscale.c                          | 122 ++++++++++--
 libswscale/swscale_internal.h                 |  26 ++-
 libswscale/x86/range_convert.asm              | 159 ++++++++++-----
 libswscale/x86/swscale.c                      |  50 +++--
 tests/checkasm/sw_range_convert.c             |  82 +++++++-
 .../fate/filter-alphaextract_alphamerge_rgb   | 100 +++++-----
 tests/ref/fate/filter-pixdesc-gray10be        |   2 +-
 tests/ref/fate/filter-pixdesc-gray10le        |   2 +-
 tests/ref/fate/filter-pixdesc-gray12be        |   2 +-
 tests/ref/fate/filter-pixdesc-gray12le        |   2 +-
 tests/ref/fate/filter-pixdesc-gray14be        |   2 +-
 tests/ref/fate/filter-pixdesc-gray14le        |   2 +-
 tests/ref/fate/filter-pixdesc-gray16be        |   2 +-
 tests/ref/fate/filter-pixdesc-gray16le        |   2 +-
 tests/ref/fate/filter-pixdesc-gray9be         |   2 +-
 tests/ref/fate/filter-pixdesc-gray9le         |   2 +-
 tests/ref/fate/filter-pixdesc-ya16be          |   2 +-
 tests/ref/fate/filter-pixdesc-ya16le          |   2 +-
 tests/ref/fate/filter-pixdesc-yuvj411p        |   2 +-
 tests/ref/fate/filter-pixdesc-yuvj420p        |   2 +-
 tests/ref/fate/filter-pixdesc-yuvj422p        |   2 +-
 tests/ref/fate/filter-pixdesc-yuvj440p        |   2 +-
 tests/ref/fate/filter-pixdesc-yuvj444p        |   2 +-
 tests/ref/fate/filter-pixfmts-copy            |  34 ++--
 tests/ref/fate/filter-pixfmts-crop            |  34 ++--
 tests/ref/fate/filter-pixfmts-field           |  34 ++--
 tests/ref/fate/filter-pixfmts-fieldorder      |  30 +--
 tests/ref/fate/filter-pixfmts-hflip           |  34 ++--
 tests/ref/fate/filter-pixfmts-il              |  34 ++--
 tests/ref/fate/filter-pixfmts-lut             |  18 +-
 tests/ref/fate/filter-pixfmts-null            |  34 ++--
 tests/ref/fate/filter-pixfmts-pad             |  22 +--
 tests/ref/fate/filter-pixfmts-pullup          |  10 +-
 tests/ref/fate/filter-pixfmts-rotate          |   4 +-
 tests/ref/fate/filter-pixfmts-scale           |  34 ++--
 tests/ref/fate/filter-pixfmts-swapuv          |  10 +-
 .../ref/fate/filter-pixfmts-tinterlace_cvlpf  |   8 +-
 .../ref/fate/filter-pixfmts-tinterlace_merge  |   8 +-
 tests/ref/fate/filter-pixfmts-tinterlace_pad  |   8 +-
 tests/ref/fate/filter-pixfmts-tinterlace_vlpf |   8 +-
 tests/ref/fate/filter-pixfmts-transpose       |  28 +--
 tests/ref/fate/filter-pixfmts-vflip           |  34 ++--
 tests/ref/fate/fitsenc-gray                   |   2 +-
 tests/ref/fate/fitsenc-gray16be               |  10 +-
 tests/ref/fate/gifenc-gray                    | 186 +++++++++---------
 tests/ref/fate/idroq-video-encode             |   2 +-
 tests/ref/fate/jpg-icc                        |   8 +-
 tests/ref/fate/sws-yuv-colorspace             |   2 +-
 tests/ref/fate/sws-yuv-range                  |   2 +-
 tests/ref/fate/vvc-conformance-SCALING_A_1    | 128 ++++++------
 tests/ref/lavf/gray16be.fits                  |   4 +-
 tests/ref/lavf/gray16be.pam                   |   4 +-
 tests/ref/lavf/gray16be.png                   |   6 +-
 tests/ref/lavf/jpg                            |   6 +-
 tests/ref/lavf/smjpeg                         |   6 +-
 tests/ref/pixfmt/gbrp-gray                    |   2 +-
 tests/ref/pixfmt/gbrp-gray10be                |   2 +-
 tests/ref/pixfmt/gbrp-gray10le                |   2 +-
 tests/ref/pixfmt/gbrp-gray12be                |   2 +-
 tests/ref/pixfmt/gbrp-gray12le                |   2 +-
 tests/ref/pixfmt/gbrp-gray16be                |   2 +-
 tests/ref/pixfmt/gbrp-gray16le                |   2 +-
 tests/ref/pixfmt/gbrp-yuvj420p                |   2 +-
 tests/ref/pixfmt/gbrp-yuvj422p                |   2 +-
 tests/ref/pixfmt/gbrp-yuvj440p                |   2 +-
 tests/ref/pixfmt/gbrp-yuvj444p                |   2 +-
 tests/ref/pixfmt/gbrp10-gray                  |   2 +-
 tests/ref/pixfmt/gbrp10-gray10be              |   2 +-
 tests/ref/pixfmt/gbrp10-gray10le              |   2 +-
 tests/ref/pixfmt/gbrp10-gray12be              |   2 +-
 tests/ref/pixfmt/gbrp10-gray12le              |   2 +-
 tests/ref/pixfmt/gbrp10-gray16be              |   2 +-
 tests/ref/pixfmt/gbrp10-gray16le              |   2 +-
 tests/ref/pixfmt/gbrp10-yuvj420p              |   2 +-
 tests/ref/pixfmt/gbrp10-yuvj422p              |   2 +-
 tests/ref/pixfmt/gbrp10-yuvj440p              |   2 +-
 tests/ref/pixfmt/gbrp10-yuvj444p              |   2 +-
 tests/ref/pixfmt/gbrp12-gray                  |   2 +-
 tests/ref/pixfmt/gbrp12-gray10be              |   2 +-
 tests/ref/pixfmt/gbrp12-gray10le              |   2 +-
 tests/ref/pixfmt/gbrp12-gray12be              |   2 +-
 tests/ref/pixfmt/gbrp12-gray12le              |   2 +-
 tests/ref/pixfmt/gbrp12-gray16be              |   2 +-
 tests/ref/pixfmt/gbrp12-gray16le              |   2 +-
 tests/ref/pixfmt/gbrp12-yuvj420p              |   2 +-
 tests/ref/pixfmt/gbrp12-yuvj422p              |   2 +-
 tests/ref/pixfmt/gbrp12-yuvj440p              |   2 +-
 tests/ref/pixfmt/gbrp12-yuvj444p              |   2 +-
 tests/ref/pixfmt/gbrp16-gray16be              |   2 +-
 tests/ref/pixfmt/gbrp16-gray16le              |   2 +-
 tests/ref/pixfmt/rgb24-gray                   |   2 +-
 tests/ref/pixfmt/rgb24-gray10be               |   2 +-
 tests/ref/pixfmt/rgb24-gray10le               |   2 +-
 tests/ref/pixfmt/rgb24-gray12be               |   2 +-
 tests/ref/pixfmt/rgb24-gray12le               |   2 +-
 tests/ref/pixfmt/rgb24-gray16be               |   2 +-
 tests/ref/pixfmt/rgb24-gray16le               |   2 +-
 tests/ref/pixfmt/rgb24-yuvj420p               |   2 +-
 tests/ref/pixfmt/rgb24-yuvj422p               |   2 +-
 tests/ref/pixfmt/rgb24-yuvj440p               |   2 +-
 tests/ref/pixfmt/rgb24-yuvj444p               |   2 +-
 tests/ref/pixfmt/rgb48-gray                   |   2 +-
 tests/ref/pixfmt/rgb48-gray10be               |   2 +-
 tests/ref/pixfmt/rgb48-gray10le               |   2 +-
 tests/ref/pixfmt/rgb48-gray12be               |   2 +-
 tests/ref/pixfmt/rgb48-gray12le               |   2 +-
 tests/ref/pixfmt/rgb48-gray16be               |   2 +-
 tests/ref/pixfmt/rgb48-gray16le               |   2 +-
 tests/ref/pixfmt/rgb48-yuvj420p               |   2 +-
 tests/ref/pixfmt/rgb48-yuvj422p               |   2 +-
 tests/ref/pixfmt/rgb48-yuvj440p               |   2 +-
 tests/ref/pixfmt/rgb48-yuvj444p               |   2 +-
 tests/ref/pixfmt/yuv444p-gray10be             |   2 +-
 tests/ref/pixfmt/yuv444p-gray10le             |   2 +-
 tests/ref/pixfmt/yuv444p-gray12be             |   2 +-
 tests/ref/pixfmt/yuv444p-gray12le             |   2 +-
 tests/ref/pixfmt/yuv444p-gray16be             |   2 +-
 tests/ref/pixfmt/yuv444p-gray16le             |   2 +-
 tests/ref/pixfmt/yuv444p-yuvj420p             |   2 +-
 tests/ref/pixfmt/yuv444p-yuvj422p             |   2 +-
 tests/ref/pixfmt/yuv444p-yuvj440p             |   2 +-
 tests/ref/pixfmt/yuv444p10-gray               |   2 +-
 tests/ref/pixfmt/yuv444p10-gray10be           |   2 +-
 tests/ref/pixfmt/yuv444p10-gray10le           |   2 +-
 tests/ref/pixfmt/yuv444p10-gray12be           |   2 +-
 tests/ref/pixfmt/yuv444p10-gray12le           |   2 +-
 tests/ref/pixfmt/yuv444p10-gray16be           |   2 +-
 tests/ref/pixfmt/yuv444p10-gray16le           |   2 +-
 tests/ref/pixfmt/yuv444p10-yuvj420p           |   2 +-
 tests/ref/pixfmt/yuv444p10-yuvj422p           |   2 +-
 tests/ref/pixfmt/yuv444p10-yuvj440p           |   2 +-
 tests/ref/pixfmt/yuv444p10-yuvj444p           |   2 +-
 tests/ref/pixfmt/yuv444p12-gray               |   2 +-
 tests/ref/pixfmt/yuv444p12-gray10be           |   2 +-
 tests/ref/pixfmt/yuv444p12-gray10le           |   2 +-
 tests/ref/pixfmt/yuv444p12-gray12be           |   2 +-
 tests/ref/pixfmt/yuv444p12-gray12le           |   2 +-
 tests/ref/pixfmt/yuv444p12-gray16be           |   2 +-
 tests/ref/pixfmt/yuv444p12-gray16le           |   2 +-
 tests/ref/pixfmt/yuv444p12-yuvj420p           |   2 +-
 tests/ref/pixfmt/yuv444p12-yuvj422p           |   2 +-
 tests/ref/pixfmt/yuv444p12-yuvj440p           |   2 +-
 tests/ref/pixfmt/yuv444p12-yuvj444p           |   2 +-
 tests/ref/pixfmt/yuv444p16-gray16be           |   2 +-
 tests/ref/pixfmt/yuv444p16-gray16le           |   2 +-
 tests/ref/pixfmt/yuvj420p                     |   2 +-
 tests/ref/pixfmt/yuvj422p                     |   2 +-
 tests/ref/pixfmt/yuvj440p                     |   2 +-
 tests/ref/pixfmt/yuvj444p                     |   2 +-
 tests/ref/seek/lavf-jpg                       |   8 +-
 tests/ref/seek/vsynth_lena-mjpeg              |  40 ++--
 tests/ref/seek/vsynth_lena-roqvideo           |   2 +-
 tests/ref/vsynth/vsynth1-amv                  |   8 +-
 tests/ref/vsynth/vsynth1-mjpeg                |   6 +-
 tests/ref/vsynth/vsynth1-mjpeg-422            |   6 +-
 tests/ref/vsynth/vsynth1-mjpeg-444            |   6 +-
 tests/ref/vsynth/vsynth1-mjpeg-huffman        |   6 +-
 tests/ref/vsynth/vsynth1-mjpeg-trell          |   8 +-
 tests/ref/vsynth/vsynth1-mjpeg-trell-huffman  |   8 +-
 tests/ref/vsynth/vsynth1-roqvideo             |   8 +-
 tests/ref/vsynth/vsynth2-amv                  |   6 +-
 tests/ref/vsynth/vsynth2-mjpeg                |   6 +-
 tests/ref/vsynth/vsynth2-mjpeg-422            |   6 +-
 tests/ref/vsynth/vsynth2-mjpeg-444            |   6 +-
 tests/ref/vsynth/vsynth2-mjpeg-huffman        |   6 +-
 tests/ref/vsynth/vsynth2-mjpeg-trell          |   8 +-
 tests/ref/vsynth/vsynth2-mjpeg-trell-huffman  |   8 +-
 tests/ref/vsynth/vsynth2-roqvideo             |   8 +-
 tests/ref/vsynth/vsynth3-amv                  |   8 +-
 tests/ref/vsynth/vsynth3-mjpeg                |   8 +-
 tests/ref/vsynth/vsynth3-mjpeg-422            |   8 +-
 tests/ref/vsynth/vsynth3-mjpeg-444            |   6 +-
 tests/ref/vsynth/vsynth3-mjpeg-huffman        |   8 +-
 tests/ref/vsynth/vsynth3-mjpeg-trell          |   6 +-
 tests/ref/vsynth/vsynth3-mjpeg-trell-huffman  |   6 +-
 tests/ref/vsynth/vsynth_lena-amv              |   6 +-
 tests/ref/vsynth/vsynth_lena-mjpeg            |   8 +-
 tests/ref/vsynth/vsynth_lena-mjpeg-422        |   6 +-
 tests/ref/vsynth/vsynth_lena-mjpeg-444        |   6 +-
 tests/ref/vsynth/vsynth_lena-mjpeg-huffman    |   8 +-
 tests/ref/vsynth/vsynth_lena-mjpeg-trell      |   8 +-
 .../vsynth/vsynth_lena-mjpeg-trell-huffman    |   8 +-
 tests/ref/vsynth/vsynth_lena-roqvideo         |   8 +-
 188 files changed, 1189 insertions(+), 836 deletions(-)

-- 
2.39.5



More information about the ffmpeg-devel mailing list