[FFmpeg-devel] [PATCH 00/17] swscale v2: new framework [RFC]
Niklas Haas
ffmpeg at haasn.xyz
Fri May 2 20:51:11 EEST 2025
On Sat, 26 Apr 2025 19:41:04 +0200 Niklas Haas <ffmpeg at haasn.xyz> wrote:
> Hi all,
>
> After extensive amounts of refactoring and iteration on the design and API,
> and the implementation of an x86 SIMD backend, I'm happy to present the
> revised version of my ongoing swscale rewrite. Now with 100% less reliance on
> compiler autovectorization.
>
> As before, I recommend (re)reading the design document to understand the
> motivation, structure and implementation details of this rewrite. At this
> point, I expect the major API and internal organization decisions to remain
> stable.
>
> I will preface with some benchmark figures, on my (new) AMD Ryzen 9 9950X3D:
>
> All formats:
> - single thread: Overall speedup=2.109x faster, min=0.018x max=40.309x
> - multi thread: Overall speedup=2.607x faster, min=0.112x max=254.738x
>
> "Common" formats: (referenced >100 times in FFmpeg source code)
> - single thread: Overall speedup=2.797x faster, min=0.408x max=16.514x
> - multi thread: Overall speedup=2.870x faster, min=0.715x max=21.983x
Small update: I noticed that one code path was accidentally not enabled. I
also implemented asm for the remaining bit-packed formats. After those two
changes, the new numbers are:
All formats:
- single thread: Overall speedup=4.247x faster, min=0.177x max=224.809x
- multi thread: Overall speedup=4.000x faster, min=0.256x max=968.725x
"Common" formats:
- single thread: Overall speedup=3.174x faster, min=0.596x max=12.616x
- multi thread: Overall speedup=3.005x faster, min=0.617x max=14.739x
>
> However, the main goal of this rewrite is not to improve performance, but to
> improve the maintainability, extensibility and correctness of the code. Most of
> the slowdowns for "common" formats are due to increased correctness (e.g.
> accurate rounding and dithering), and not the result of a regression per se.
>
> All of the remaining slowdowns (notably, the 0.1x cases) are due to incomplete
> coverage of the x86 SIMD. Notably, this currently affects bit packed formats
> (e.g. rgb8, rgb4). (I also did not yet incorporate any AVX-512 code, which
> some of the existing routines take advantage of)
>
> While I will continue working on this and expanding coverage to all remaining
> operations, I felt that now is a good point in time to get some code review
> and feedback regardless. I would especially appreciate code review of the x86
> SIMD code inside libswscale/x86/ops_*.asm, as this is my first time writing
> x86 assembly code.
>
> doc/APIchanges | 3 +
> doc/scaler.texi | 3 +
> doc/swscale-v2.txt | 344 +++++++++++++++++++++++++++
> libswscale/Makefile | 9 +
> libswscale/format.c | 945 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> libswscale/format.h | 29 ++-
> libswscale/graph.c | 151 ++++++++----
> libswscale/graph.h | 37 ++-
> libswscale/ops.c | 850 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> libswscale/ops.h | 263 +++++++++++++++++++++
> libswscale/ops_backend.c | 101 ++++++++
> libswscale/ops_backend.h | 181 ++++++++++++++
> libswscale/ops_chain.c | 291 +++++++++++++++++++++++
> libswscale/ops_chain.h | 108 +++++++++
> libswscale/ops_internal.h | 103 ++++++++
> libswscale/ops_optimizer.c | 810 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> libswscale/ops_tmpl_common.c | 176 ++++++++++++++
> libswscale/ops_tmpl_float.c | 255 ++++++++++++++++++++
> libswscale/ops_tmpl_int.c | 609 +++++++++++++++++++++++++++++++++++++++++++++++
> libswscale/options.c | 1 +
> libswscale/swscale.h | 7 +
> libswscale/tests/swscale.c | 11 +-
> libswscale/version.h | 2 +-
> libswscale/x86/Makefile | 3 +
> libswscale/x86/ops.c | 735 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> libswscale/x86/ops_common.asm | 208 ++++++++++++++++
> libswscale/x86/ops_float.asm | 376 +++++++++++++++++++++++++++++
> libswscale/x86/ops_int.asm | 882 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> tests/checkasm/Makefile | 8 +-
> tests/checkasm/checkasm.c | 4 +-
> tests/checkasm/checkasm.h | 26 +-
> tests/checkasm/sw_ops.c | 748 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 32 files changed, 8206 insertions(+), 73 deletions(-)
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list