[FFmpeg-devel] [PATCH 1/2] doc/filters: Add CUDA Video Filters section for CUDA-based and CUDA+NPP based filters.

Tue Jan 28 21:12:30 EET 2025

---
 doc/filters.texi | 687 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 687 insertions(+)

diff --git a/doc/filters.texi b/doc/filters.texi
index a14c7e7e77..c4f312d2b8 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -26890,6 +26890,693 @@ value.
 
 @c man end VIDEO FILTERS
 
+ at chapter CUDA Video Filters
+ at c man begin CUDA Video Filters
+
+To enable compilation of these filters you need to configure FFmpeg with
+ at code{--enable-cuda-nvcc} and/or @code{--enable-libnpp} and Nvidia CUDA Toolkit must be installed.
+
+Running CUDA filters requires you to initialize a hardware device and to pass that device to all filters in any filter graph.
+ at table @option
+
+ at item -init_hw_device cuda[=@var{name}][:@var{device}[, at var{key=value}...]]
+Initialise a new hardware device of type @var{cuda} called @var{name}, using the
+given device parameters.
+
+ at item -filter_hw_device @var{name}
+Pass the hardware device called @var{name} to all filters in any filter graph.
+
+ at end table
+
+For more detailed information see @url{https://www.ffmpeg.org/ffmpeg.html#Advanced-Video-options}
+
+ at itemize
+ at item
+Example of initializing second CUDA device on the system and running scale_cuda and bilateral_cuda filters.
+ at example
+./ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -init_hw_device cuda:1 -filter_complex \
+"[0:v]scale_cuda=format=yuv444p[scaled_video];[scaled_video]bilateral_cuda=window_size=9:sigmaS=3.0:sigmaR=50.0" \
+-an -sn -c:v h264_nvenc -cq 20 out.mp4
+ at end example
+ at end itemize
+
+Since CUDA filters operate exclusively on GPU memory, frame data must sometimes be uploaded (@ref{hwupload}) to hardware surfaces associated with the appropriate CUDA device before processing, and downloaded (@ref{hwdownload}) back to normal memory afterward, if required. Whether @ref{hwupload} or @ref{hwdownload} is necessary depends on the specific workflow:
+
+ at itemize
+ at item If the input frames are already in GPU memory (e.g., when using @code{-hwaccel cuda} or @code{-hwaccel_output_format cuda}), explicit use of @ref{hwupload} is not needed, as the data is already in the appropriate memory space.
+ at item If the input frames are in CPU memory (e.g., software-decoded frames or frames processed by CPU-based filters), it is necessary to use @ref{hwupload} to transfer the data to GPU memory for CUDA processing.
+ at item If the output of the CUDA filters needs to be further processed by software-based filters or saved in a format not supported by GPU-based encoders, @ref{hwdownload} is required to transfer the data back to CPU memory.
+ at end itemize
+Note that @ref{hwupload} uploads data to a surface with the same layout as the software frame, so it may be necessary to add a @ref{format} filter immediately before @ref{hwupload} to ensure the input is in the correct format. Similarly, @ref{hwdownload} may not support all output formats, so an additional @ref{format} filter may need to be inserted immediately after @ref{hwdownload} in the filter graph to ensure compatibility.
+
+ at section CUDA
+Below is a description of the currently available Nvidia CUDA video filters.
+
+To enable compilation of these filters you need to configure FFmpeg with
+ at code{--enable-cuda-nvcc} and Nvidia CUDA Toolkit must be installed.
+
+ at subsection bilateral_cuda
+CUDA accelerated bilateral filter, an edge preserving filter.
+This filter is mathematically accurate thanks to the use of GPU acceleration.
+For best output quality, use one to one chroma subsampling, i.e. yuv444p format.
+
+The filter accepts the following options:
+ at table @option
+ at item sigmaS
+Set sigma of gaussian function to calculate spatial weight, also called sigma space.
+Allowed range is 0.1 to 512. Default is 0.1.
+
+ at item sigmaR
+Set sigma of gaussian function to calculate color range weight, also called sigma color.
+Allowed range is 0.1 to 512. Default is 0.1.
+
+ at item window_size
+Set window size of the bilateral function to determine the number of neighbours to loop on.
+If the number entered is even, one will be added automatically.
+Allowed range is 1 to 255. Default is 1.
+ at end table
+ at subsubsection Examples
+
+ at itemize
+ at item
+Apply the bilateral filter on a video.
+
+ at example
+./ffmpeg -v verbose \
+-hwaccel cuda -hwaccel_output_format cuda -i input.mp4  \
+-init_hw_device cuda \
+-filter_complex \
+" \
+[0:v]scale_cuda=format=yuv444p[scaled_video];
+[scaled_video]bilateral_cuda=window_size=9:sigmaS=3.0:sigmaR=50.0" \
+-an -sn -c:v h264_nvenc -cq 20 out.mp4
+ at end example
+
+ at end itemize
+
+ at subsection bwdif_cuda
+
+Deinterlace the input video using the @ref{bwdif} algorithm, but implemented
+in CUDA so that it can work as part of a GPU accelerated pipeline with nvdec
+and/or nvenc.
+
+It accepts the following parameters:
+
+ at table @option
+ at item mode
+The interlacing mode to adopt. It accepts one of the following values:
+
+ at table @option
+ at item 0, send_frame
+Output one frame for each frame.
+ at item 1, send_field
+Output one frame for each field.
+ at end table
+
+The default value is @code{send_field}.
+
+ at item parity
+The picture field parity assumed for the input interlaced video. It accepts one
+of the following values:
+
+ at table @option
+ at item 0, tff
+Assume the top field is first.
+ at item 1, bff
+Assume the bottom field is first.
+ at item -1, auto
+Enable automatic detection of field parity.
+ at end table
+
+The default value is @code{auto}.
+If the interlacing is unknown or the decoder does not export this information,
+top field first will be assumed.
+
+ at item deint
+Specify which frames to deinterlace. Accepts one of the following
+values:
+
+ at table @option
+ at item 0, all
+Deinterlace all frames.
+ at item 1, interlaced
+Only deinterlace frames marked as interlaced.
+ at end table
+
+The default value is @code{all}.
+ at end table
+
+ at subsection chromakey_cuda
+CUDA accelerated YUV colorspace color/chroma keying.
+
+This filter works like normal chromakey filter but operates on CUDA frames.
+for more details and parameters see @ref{chromakey}.
+
+ at subsubsection Examples
+
+ at itemize
+ at item
+Make all the green pixels in the input video transparent and use it as an overlay for another video:
+
+ at example
+./ffmpeg \
+    -hwaccel cuda -hwaccel_output_format cuda -i input_green.mp4  \
+    -hwaccel cuda -hwaccel_output_format cuda -i base_video.mp4 \
+    -init_hw_device cuda \
+    -filter_complex \
+    " \
+        [0:v]chromakey_cuda=0x25302D:0.1:0.12:1[overlay_video]; \
+        [1:v]scale_cuda=format=yuv420p[base]; \
+        [base][overlay_video]overlay_cuda" \
+    -an -sn -c:v h264_nvenc -cq 20 output.mp4
+ at end example
+
+ at item
+Process two software sources, explicitly uploading the frames:
+
+ at example
+./ffmpeg -init_hw_device cuda=cuda -filter_hw_device cuda \
+    -f lavfi -i color=size=800x600:color=white,format=yuv420p \
+    -f lavfi -i yuvtestsrc=size=200x200,format=yuv420p \
+    -filter_complex \
+    " \
+        [0]hwupload[under]; \
+        [1]hwupload,chromakey_cuda=green:0.1:0.12[over]; \
+        [under][over]overlay_cuda" \
+    -c:v hevc_nvenc -cq 18 -preset slow output.mp4
+ at end example
+
+ at end itemize
+
+ at subsection colorspace_cuda
+
+CUDA accelerated implementation of the colorspace filter.
+
+It is by no means feature complete compared to the software colorspace filter,
+and at the current time only supports color range conversion between jpeg/full
+and mpeg/limited range.
+
+The filter accepts the following options:
+
+ at table @option
+ at item range
+Specify output color range.
+
+The accepted values are:
+ at table @samp
+ at item tv
+TV (restricted) range
+
+ at item mpeg
+MPEG (restricted) range
+
+ at item pc
+PC (full) range
+
+ at item jpeg
+JPEG (full) range
+
+ at end table
+
+ at end table
+
+ at anchor{overlay_cuda_section}
+ at subsection overlay_cuda
+
+Overlay one video on top of another.
+
+This is the CUDA variant of the @ref{overlay} filter.
+It only accepts CUDA frames. The underlying input pixel formats have to match.
+
+It takes two inputs and has one output. The first input is the "main"
+video on which the second input is overlaid.
+
+It accepts the following parameters:
+
+ at table @option
+ at item x
+ at item y
+Set expressions for the x and y coordinates of the overlaid video
+on the main video.
+
+They can contain the following parameters:
+
+ at table @option
+
+ at item main_w, W
+ at item main_h, H
+The main input width and height.
+
+ at item overlay_w, w
+ at item overlay_h, h
+The overlay input width and height.
+
+ at item x
+ at item y
+The computed values for @var{x} and @var{y}. They are evaluated for
+each new frame.
+
+ at item n
+The ordinal index of the main input frame, starting from 0.
+
+ at item pos
+The byte offset position in the file of the main input frame, NAN if unknown.
+Deprecated, do not use.
+
+ at item t
+The timestamp of the main input frame, expressed in seconds, NAN if unknown.
+
+ at end table
+
+Default value is "0" for both expressions.
+
+ at item eval
+Set when the expressions for @option{x} and @option{y} are evaluated.
+
+It accepts the following values:
+ at table @option
+ at item init
+Evaluate expressions once during filter initialization or
+when a command is processed.
+
+ at item frame
+Evaluate expressions for each incoming frame
+ at end table
+
+Default value is @option{frame}.
+
+ at item eof_action
+See @ref{framesync}.
+
+ at item shortest
+See @ref{framesync}.
+
+ at item repeatlast
+See @ref{framesync}.
+
+ at end table
+
+This filter also supports the @ref{framesync} options.
+
+ at anchor{scale_cuda_section}
+ at subsection scale_cuda
+
+Scale (resize) and convert (pixel format) the input video, using accelerated CUDA kernels.
+Setting the output width and height works in the same way as for the @ref{scale} filter.
+
+The filter accepts the following options:
+ at table @option
+ at item w
+ at item h
+Set the output video dimension expression. Default value is the input dimension.
+
+Allows for the same expressions as the @ref{scale} filter.
+
+ at item interp_algo
+Sets the algorithm used for scaling:
+
+ at table @var
+ at item nearest
+Nearest neighbour
+
+Used by default if input parameters match the desired output.
+
+ at item bilinear
+Bilinear
+
+ at item bicubic
+Bicubic
+
+This is the default.
+
+ at item lanczos
+Lanczos
+
+ at end table
+
+ at item format
+Controls the output pixel format. By default, or if none is specified, the input
+pixel format is used.
+
+The filter does not support converting between YUV and RGB pixel formats.
+
+ at item passthrough
+If set to 0, every frame is processed, even if no conversion is necessary.
+This mode can be useful to use the filter as a buffer for a downstream
+frame-consumer that exhausts the limited decoder frame pool.
+
+If set to 1, frames are passed through as-is if they match the desired output
+parameters. This is the default behaviour.
+
+ at item param
+Algorithm-Specific parameter.
+
+Affects the curves of the bicubic algorithm.
+
+ at item force_original_aspect_ratio
+ at item force_divisible_by
+Work the same as the identical @ref{scale} filter options.
+
+ at end table
+
+ at subsubsection Examples
+
+ at itemize
+ at item
+Scale input to 720p, keeping aspect ratio and ensuring the output is yuv420p.
+ at example
+scale_cuda=-2:720:format=yuv420p
+ at end example
+
+ at item
+Upscale to 4K using nearest neighbour algorithm.
+ at example
+scale_cuda=4096:2160:interp_algo=nearest
+ at end example
+
+ at item
+Don't do any conversion or scaling, but copy all input frames into newly allocated ones.
+This can be useful to deal with a filter and encode chain that otherwise exhausts the
+decoders frame pool.
+ at example
+scale_cuda=passthrough=0
+ at end example
+ at end itemize
+
+ at subsection yadif_cuda
+
+Deinterlace the input video using the @ref{yadif} algorithm, but implemented
+in CUDA so that it can work as part of a GPU accelerated pipeline with nvdec
+and/or nvenc.
+
+It accepts the following parameters:
+
+
+ at table @option
+
+ at item mode
+The interlacing mode to adopt. It accepts one of the following values:
+
+ at table @option
+ at item 0, send_frame
+Output one frame for each frame.
+ at item 1, send_field
+Output one frame for each field.
+ at item 2, send_frame_nospatial
+Like @code{send_frame}, but it skips the spatial interlacing check.
+ at item 3, send_field_nospatial
+Like @code{send_field}, but it skips the spatial interlacing check.
+ at end table
+
+The default value is @code{send_frame}.
+
+ at item parity
+The picture field parity assumed for the input interlaced video. It accepts one
+of the following values:
+
+ at table @option
+ at item 0, tff
+Assume the top field is first.
+ at item 1, bff
+Assume the bottom field is first.
+ at item -1, auto
+Enable automatic detection of field parity.
+ at end table
+
+The default value is @code{auto}.
+If the interlacing is unknown or the decoder does not export this information,
+top field first will be assumed.
+
+ at item deint
+Specify which frames to deinterlace. Accepts one of the following
+values:
+
+ at table @option
+ at item 0, all
+Deinterlace all frames.
+ at item 1, interlaced
+Only deinterlace frames marked as interlaced.
+ at end table
+
+The default value is @code{all}.
+ at end table
+
+ at section CUDA NPP
+Below is a description of the currently available NVIDIA Performance Primitives (libnpp) video filters.
+
+To enable compilation of these filters you need to configure FFmpeg with @code{--enable-libnpp} and Nvidia CUDA Toolkit must be installed.
+
+ at anchor{scale_npp_section}
+ at subsection scale_npp
+
+Use the NVIDIA Performance Primitives (libnpp) to perform scaling and/or pixel
+format conversion on CUDA video frames. Setting the output width and height
+works in the same way as for the @var{scale} filter.
+
+The following additional options are accepted:
+ at table @option
+ at item format
+The pixel format of the output CUDA frames. If set to the string "same" (the
+default), the input format will be kept. Note that automatic format negotiation
+and conversion is not yet supported for hardware frames
+
+ at item interp_algo
+The interpolation algorithm used for resizing. One of the following:
+ at table @option
+ at item nn
+Nearest neighbour.
+
+ at item linear
+ at item cubic
+ at item cubic2p_bspline
+2-parameter cubic (B=1, C=0)
+
+ at item cubic2p_catmullrom
+2-parameter cubic (B=0, C=1/2)
+
+ at item cubic2p_b05c03
+2-parameter cubic (B=1/2, C=3/10)
+
+ at item super
+Supersampling
+
+ at item lanczos
+ at end table
+
+ at item force_original_aspect_ratio
+Enable decreasing or increasing output video width or height if necessary to
+keep the original aspect ratio. Possible values:
+
+ at table @samp
+ at item disable
+Scale the video as specified and disable this feature.
+
+ at item decrease
+The output video dimensions will automatically be decreased if needed.
+
+ at item increase
+The output video dimensions will automatically be increased if needed.
+
+ at end table
+
+One useful instance of this option is that when you know a specific device's
+maximum allowed resolution, you can use this to limit the output video to
+that, while retaining the aspect ratio. For example, device A allows
+1280x720 playback, and your video is 1920x800. Using this option (set it to
+decrease) and specifying 1280x720 to the command line makes the output
+1280x533.
+
+Please note that this is a different thing than specifying -1 for @option{w}
+or @option{h}, you still need to specify the output resolution for this option
+to work.
+
+ at item force_divisible_by
+Ensures that both the output dimensions, width and height, are divisible by the
+given integer when used together with @option{force_original_aspect_ratio}. This
+works similar to using @code{-n} in the @option{w} and @option{h} options.
+
+This option respects the value set for @option{force_original_aspect_ratio},
+increasing or decreasing the resolution accordingly. The video's aspect ratio
+may be slightly modified.
+
+This option can be handy if you need to have a video fit within or exceed
+a defined resolution using @option{force_original_aspect_ratio} but also have
+encoder restrictions on width or height divisibility.
+
+ at item eval
+Specify when to evaluate @var{width} and @var{height} expression. It accepts the following values:
+
+ at table @samp
+ at item init
+Only evaluate expressions once during the filter initialization or when a command is processed.
+
+ at item frame
+Evaluate expressions for each incoming frame.
+
+ at end table
+
+ at end table
+
+The values of the @option{w} and @option{h} options are expressions
+containing the following constants:
+
+ at table @var
+ at item in_w
+ at item in_h
+The input width and height
+
+ at item iw
+ at item ih
+These are the same as @var{in_w} and @var{in_h}.
+
+ at item out_w
+ at item out_h
+The output (scaled) width and height
+
+ at item ow
+ at item oh
+These are the same as @var{out_w} and @var{out_h}
+
+ at item a
+The same as @var{iw} / @var{ih}
+
+ at item sar
+input sample aspect ratio
+
+ at item dar
+The input display aspect ratio. Calculated from @code{(iw / ih) * sar}.
+
+ at item n
+The (sequential) number of the input frame, starting from 0.
+Only available with @code{eval=frame}.
+
+ at item t
+The presentation timestamp of the input frame, expressed as a number of
+seconds. Only available with @code{eval=frame}.
+
+ at item pos
+The position (byte offset) of the frame in the input stream, or NaN if
+this information is unavailable and/or meaningless (for example in case of synthetic video).
+Only available with @code{eval=frame}.
+Deprecated, do not use.
+ at end table
+
+ at subsection scale2ref_npp
+
+Use the NVIDIA Performance Primitives (libnpp) to scale (resize) the input
+video, based on a reference video.
+
+See the @ref{scale_npp} filter for available options, scale2ref_npp supports the same
+but uses the reference video instead of the main input as basis. scale2ref_npp
+also supports the following additional constants for the @option{w} and
+ at option{h} options:
+
+ at table @var
+ at item main_w
+ at item main_h
+The main input video's width and height
+
+ at item main_a
+The same as @var{main_w} / @var{main_h}
+
+ at item main_sar
+The main input video's sample aspect ratio
+
+ at item main_dar, mdar
+The main input video's display aspect ratio. Calculated from
+ at code{(main_w / main_h) * main_sar}.
+
+ at item main_n
+The (sequential) number of the main input frame, starting from 0.
+Only available with @code{eval=frame}.
+
+ at item main_t
+The presentation timestamp of the main input frame, expressed as a number of
+seconds. Only available with @code{eval=frame}.
+
+ at item main_pos
+The position (byte offset) of the frame in the main input stream, or NaN if
+this information is unavailable and/or meaningless (for example in case of synthetic video).
+Only available with @code{eval=frame}.
+ at end table
+
+ at subsubsection Examples
+
+ at itemize
+ at item
+Scale a subtitle stream (b) to match the main video (a) in size before overlaying
+ at example
+'scale2ref_npp[b][a];[a][b]overlay_cuda'
+ at end example
+
+ at item
+Scale a logo to 1/10th the height of a video, while preserving its display aspect ratio.
+ at example
+[logo-in][video-in]scale2ref_npp=w=oh*mdar:h=ih/10[logo-out][video-out]
+ at end example
+ at end itemize
+
+ at subsection sharpen_npp
+Use the NVIDIA Performance Primitives (libnpp) to perform image sharpening with
+border control.
+
+The following additional options are accepted:
+ at table @option
+
+ at item border_type
+Type of sampling to be used ad frame borders. One of the following:
+ at table @option
+
+ at item replicate
+Replicate pixel values.
+
+ at end table
+ at end table
+
+ at subsection transpose_npp
+
+Transpose rows with columns in the input video and optionally flip it.
+For more in depth examples see the @ref{transpose} video filter, which shares mostly the same options.
+
+It accepts the following parameters:
+
+ at table @option
+
+ at item dir
+Specify the transposition direction.
+
+Can assume the following values:
+ at table @samp
+ at item cclock_flip
+Rotate by 90 degrees counterclockwise and vertically flip. (default)
+
+ at item clock
+Rotate by 90 degrees clockwise.
+
+ at item cclock
+Rotate by 90 degrees counterclockwise.
+
+ at item clock_flip
+Rotate by 90 degrees clockwise and vertically flip.
+ at end table
+
+ at item passthrough
+Do not apply the transposition if the input geometry matches the one
+specified by the specified value. It accepts the following values:
+ at table @samp
+ at item none
+Always apply transposition. (default)
+ at item portrait
+Preserve portrait geometry (when @var{height} >= @var{width}).
+ at item landscape
+Preserve landscape geometry (when @var{width} >= @var{height}).
+ at end table
+
+ at end table
+
+ at c man end CUDA Video Filters
+
+
 @chapter OpenCL Video Filters
 @c man begin OPENCL VIDEO FILTERS
 
-- 
2.39.5 (Apple Git-154)