[FFmpeg-devel] [PATCH 00/19] swscale: major API refactor and new graph dispatch

Mon Oct 14 12:47:53 EEST 2024

On Fri, 11 Oct 2024 00:26:47 +0200 Niklas Haas <ffmpeg at haasn.xyz> wrote:
> This patch series introduces the new API that was discussed in my previous
> series, and starts working towards a new, graph-based scaler dispatch
> mechanism. This currently piggybacks off of existing swscale logic, but I
> plan on incrementally rewriting the high-level innards going forwards.
>
> In order te preserve backwards compatibility, and to provide a clear migration
> path, the old type names and API functions are still accessible, and the new
> implementation is hidden inside the same SwsInternal struct. This allows us
> to reclaim the same symbol name while only wasting around 40 kB of memory per
> allocated SwsContext. In the future, this will become less as the old context
> gets deprecated and eventually removed, in favor of less monolithic, individual
> filter passes.
>
> The downside of this approach is that users must explicitly choose to either
> use the "new" API usage style or the "old" API usage style, conditionally on
> whether or not sws_init_context() was called, and cannot mix and match the two.
> However, this should not be a major issue in practice, as the only way to
> use the new API is by deliberatily *omitting* all of the legacy init calls,
> something that legacy API users cannot possibly do.
>
> In exchange for this, we gain the massive upside of not needing to use
> sws_alloc_context2(), sws_scale_frame2() and so on. I consider this a decent
> compromise.
>
> Lastly, I also rewrote the test framework to facilitate further development
> of the new API, and to benchmark it to fend off any unintended performance
> regressions.
>
> The peformance should be pretty similar accross the board, since the
> implementation didn't really change so far. However, some cases have gotten
> dramatically faster, for example xyz12le -> yuv420p, since cascaded contexts
> and XYZ pre-passes can now be properly threaded. This results in an almost
> 400% speed improvement on my machine at 1080p.

I should add:

1. I will add the necessary version bumps and documentation changes before
   merging.

2. I decided to omit the quality presets and changes to the scaler selection
   API for now. I think the proper solution to this is to implement a
   generalized filter kernel abstraction, see:

   https://code.videolan.org/videolan/libplacebo/-/blob/master/src/filters.h

   For the quality presets, I would rather handle that after the new scaler
   API is in place, which I will first need to design and implement.