[FFmpeg-devel] [PATCH 1/2] doc/ffmpeg: rewrite the detailed description chapter

Anton Khirnov anton at khirnov.net
Fri Oct 4 10:46:10 EEST 2024


Split it into sections that describe in detail
* the components of the transcoding pipeline
* the main features it handles, in order of complexity
    * streamcopy
    * transcoding
    * filtering

Replace the current confusing/misleading diagrams with new ones that
actually reflect the program components and data flow between them.
---
 doc/ffmpeg.texi | 491 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 378 insertions(+), 113 deletions(-)

diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index de140067ae..e17c17bcd7 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -87,140 +87,405 @@ The format option may be needed for raw input files.
 @chapter Detailed description
 @c man begin DETAILED DESCRIPTION
 
-The transcoding process in @command{ffmpeg} for each output can be described by
-the following diagram:
+ at command{ffmpeg} builds a transcoding pipeline out of the components listed
+below. The program's operation then consists of input data chunks flowing from
+the sources down the pipes towards the sinks, while being transformed by the
+components they encounter along the way.
 
+The following kinds of components are available:
+ at itemize
+ at item
+ at emph{Demuxers} (short for "demultiplexers") read an input source in order to
+extract
+
+ at itemize
+ at item
+global properties such as metadata or chapters;
+ at item
+list of input elementary streams and their properties
+ at end itemize
+
+One demuxer instance is created for each @option{-i} option, and sends encoded
+ at emph{packets} to @emph{decoders} or @emph{muxers}.
+
+In other literature, demuxers are sometimes called @emph{splitters}, because
+their main function is splitting a file into elementary streams (though some
+files only contain one elementary stream).
+
+A schematic representation of a demuxer looks like this:
 @verbatim
- _______              ______________
-|       |            |              |
-| input |  demuxer   | encoded data |   decoder
-| file  | ---------> | packets      | -----+
-|_______|            |______________|      |
-                                           v
-                                       _________
-                                      |         |
-                                      | decoded |
-                                      | frames  |
-                                      |_________|
- ________             ______________       |
-|        |           |              |      |
-| output | <-------- | encoded data | <----+
-| file   |   muxer   | packets      |   encoder
-|________|           |______________|
-
-
+┌──────────┬───────────────────────┐
+│ demuxer  │                       │ packets for stream 0
+╞══════════╡ elementary stream 0   ├──────────────────────⮞
+│          │                       │
+│  global  ├───────────────────────┤
+│properties│                       │ packets for stream 1
+│   and    │ elementary stream 1   ├──────────────────────⮞
+│ metadata │                       │
+│          ├───────────────────────┤
+│          │                       │
+│          │     ...........       │
+│          │                       │
+│          ├───────────────────────┤
+│          │                       │ packets for stream N
+│          │ elementary stream N   ├──────────────────────⮞
+│          │                       │
+└──────────┴───────────────────────┘
+     ⯅
+     │
+     │ read from file, network stream,
+     │     grabbing device, etc.
+     │
 @end verbatim
 
- at command{ffmpeg} calls the libavformat library (containing demuxers) to read
-input files and get packets containing encoded data from them. When there are
-multiple input files, @command{ffmpeg} tries to keep them synchronized by
-tracking lowest timestamp on any active input stream.
+ at item
+ at emph{Decoders} receive encoded (compressed) @emph{packets} for an audio, video,
+or subtitle elementary stream, and decode them into raw @emph{frames} (arrays of
+pixels for video, PCM for audio). A decoder is typically associated with (and
+receives its input from) an elementary stream in a @emph{demuxer}, but sometimes
+may also exist on its own (see @ref{Loopback decoders}).
+
+A schematic representation of a decoder looks like this:
+ at verbatim
+          ┌─────────┐
+ packets  │         │ raw frames
+─────────⮞│ decoder ├────────────⮞
+          │         │
+          └─────────┘
+ at end verbatim
+
+ at item
+ at emph{Filtergraphs} process and transform raw audio or video @emph{frames}. A
+filtergraph consists of one or more individual @emph{filters} linked into a
+graph. Filtergraphs come in two flavors - @emph{simple} and @emph{complex},
+configured with the @option{-filter} and @option{-filter_complex} options,
+respectively.
+
+A simple filtergraph is associated with an @emph{output elementary stream}; it
+receives the input to be filtered from a @emph{decoder} and sends filtered
+output to that output stream's @emph{encoder}.
+
+A simple video filtergraph that performs deinterlacing (using the @code{yadif}
+deinterlacer) followed by resizing (using the @code{scale} filter) can look like
+this:
+ at verbatim
+
+             ┌────────────────────────┐
+             │  simple filtergraph    │
+ frames from ╞════════════════════════╡ frames for
+ a decoder   │  ┌───────┐  ┌───────┐  │ an encoder
+────────────⮞├─⮞│ yadif ├─⮞│ scale ├─⮞│────────────⮞
+             │  └───────┘  └───────┘  │
+             └────────────────────────┘
+ at end verbatim
+
+A complex filtergraph is standalone and not associated with any specific stream.
+It may have multiple (or zero) inputs, potentially of different types (audio or
+video), each of which receiving data either from a decoder or another complex
+filtergraph's outputs. It also has one or more outputs that feed either an
+encoder or another complex filtergraph's input.
+
+The following example diagram represents a complex filtergraph with 3 inputs and
+2 outputs (all video):
+ at verbatim
+          ┌─────────────────────────────────────────────────┐
+          │               complex filtergraph               │
+          ╞═════════════════════════════════════════════════╡
+ frames   ├───────┐  ┌─────────┐      ┌─────────┐  ┌────────┤ frames
+─────────⮞│input 0├─⮞│ overlay ├─────⮞│ overlay ├─⮞│output 0├────────⮞
+          ├───────┘  │         │      │         │  └────────┤
+ frames   ├───────┐╭⮞│         │    ╭⮞│         │           │
+─────────⮞│input 1├╯ └─────────┘    │ └─────────┘           │
+          ├───────┘                 │                       │
+ frames   ├───────┐ ┌─────┐ ┌─────┬─╯              ┌────────┤ frames
+─────────⮞│input 2├⮞│scale├⮞│split├───────────────⮞│output 1├────────⮞
+          ├───────┘ └─────┘ └─────┘                └────────┤
+          └─────────────────────────────────────────────────┘
+ at end verbatim
+Frames from second input are overlaid over those from the first. Frames from the
+third input are rescaled, then the duplicated into two identical streams. One of
+them is overlaid over the combined first two inputs, with the result exposed as
+the filtergraph's first output. The other duplicate ends up being the
+filtergraph's second output.
+
+ at item
+ at emph{Encoders} receive raw audio, video, or subtitle @emph{frames} and encode
+them into encoded @emph{packets}. The encoding (compression) process is
+typically lossy - it degrades stream quality to make the output smaller; some
+encoders are @emph{lossless}, but at the cost of much higher output size. A
+video or audio encoder receives its input from some filtergraph's output,
+subtitle encoders receive input from a decoder (since subtitle filtering is not
+supported yet). Every encoder is associated with some muxer's @emph{output
+elementary stream} and sends its output to that muxer.
+
+A schematic representation of an encoder looks like this:
+ at verbatim
+             ┌─────────┐
+ raw frames  │         │ packets
+────────────⮞│ encoder ├─────────⮞
+             │         │
+             └─────────┘
+ at end verbatim
+
+ at item
+ at emph{Muxers} (short for "multiplexers") receive encoded @emph{packets} for
+their elementary streams from encoders (the @emph{transcoding} path) or directly
+from demuxers (the @emph{streamcopy} path), interleave them (when there is more
+than one elementary stream), and write the resulting bytes into the output file
+(or pipe, network stream, etc.).
+
+A schematic representation of a muxer looks like this:
+ at verbatim
+                       ┌──────────────────────┬───────────┐
+ packets for stream 0  │                      │   muxer   │
+──────────────────────⮞│  elementary stream 0 ╞═══════════╡
+                       │                      │           │
+                       ├──────────────────────┤  global   │
+ packets for stream 1  │                      │properties │
+──────────────────────⮞│  elementary stream 1 │   and     │
+                       │                      │ metadata  │
+                       ├──────────────────────┤           │
+                       │                      │           │
+                       │     ...........      │           │
+                       │                      │           │
+                       ├──────────────────────┤           │
+ packets for stream N  │                      │           │
+──────────────────────⮞│  elementary stream N │           │
+                       │                      │           │
+                       └──────────────────────┴─────┬─────┘
+                                                    │
+                     write to file, network stream, │
+                         grabbing device, etc.      │
+                                                    │
+                                                    ▼
+ at end verbatim
+
+ at end itemize
+
+ at section Streamcopy
+The simplest pipeline in @command{ffmpeg} is single-stream
+ at emph{streamcopy}, that is copying one @emph{input elementary stream}'s packets
+without decoding, filtering, or encoding them. As an example, consider an input
+file called @file{INPUT.mkv} with 3 elementary streams, from which we take the
+second and write it to file @file{OUTPUT.mp4}. A schematic representation of
+such a pipeline looks like this:
+ at verbatim
+┌──────────┬─────────────────────┐
+│ demuxer  │                     │ unused
+╞══════════╡ elementary stream 0 ├────────╳
+│          │                     │
+│INPUT.mkv ├─────────────────────┤          ┌──────────────────────┬───────────┐
+│          │                     │ packets  │                      │   muxer   │
+│          │ elementary stream 1 ├─────────⮞│  elementary stream 0 ╞═══════════╡
+│          │                     │          │                      │OUTPUT.mp4 │
+│          ├─────────────────────┤          └──────────────────────┴───────────┘
+│          │                     │ unused
+│          │ elementary stream 2 ├────────╳
+│          │                     │
+└──────────┴─────────────────────┘
+ at end verbatim
+
+The above pipeline can be constructed with the following commandline:
+ at example
+ffmpeg -i INPUT.mkv -map 0:1 -c copy OUTPUT.mp4
+ at end example
+
+In this commandline
+ at itemize
+
+ at item
+there is a single input @file{INPUT.mkv};
+
+ at item
+there are no input options for this input;
+
+ at item
+there is a single output @file{OUTPUT.mp4};
+
+ at item
+there are two output options for this output:
+
+ at itemize
+ at item
+ at code{-map 0:1} selects the input stream to be used - from input with index 0
+(i.e. the first one) the stream with index 1 (i.e. the second one);
+
+ at item
+ at code{-c copy} selects the @code{copy} encoder, i.e. streamcopy with no decoding
+or encoding.
+ at end itemize
+
+ at end itemize
+
+Streamcopy is useful for changing the elementary stream count, container format,
+or modifying container-level metadata. Since there is no decoding or encoding,
+it is very fast and there is no quality loss. However, it might not work in some
+cases because of a variety of factors (e.g. certain information required by the
+target container is not available in the source). Applying filters is obviously
+also impossible, since filters work on decoded frames.
+
+More complex streamcopy scenarios can be constructed - e.g. combining streams
+from two input files into a single output:
+ at verbatim
+┌──────────┬────────────────────┐         ┌────────────────────┬───────────┐
+│ demuxer 0│                    │ packets │                    │   muxer   │
+╞══════════╡elementary stream 0 ├────────⮞│elementary stream 0 ╞═══════════╡
+│INPUT0.mkv│                    │         │                    │OUTPUT.mp4 │
+└──────────┴────────────────────┘         ├────────────────────┤           │
+┌──────────┬────────────────────┐         │                    │           │
+│ demuxer 1│                    │ packets │elementary stream 1 │           │
+╞══════════╡elementary stream 0 ├────────⮞│                    │           │
+│INPUT1.aac│                    │         └────────────────────┴───────────┘
+└──────────┴────────────────────┘
+ at end verbatim
+that can be built by the commandline
+ at example
+ffmpeg -i INPUT0.mkv -i INPUT1.aac -map 0:0 -map 1:0 -c copy OUTPUT.mp4
+ at end example
+
+The output @option{-map} option is used twice here, creating two streams in the
+output file - one fed by the first input and one by the second. The single
+instance of the @option{-c} option selects streamcopy for both of those streams.
+You could also use multiple instances of this option together with
+ at ref{Stream specifiers} to apply different values to each stream, as will be
+demonstrated in following sections.
+
+A converse scenario is splitting multiple streams from a single input into
+multiple outputs:
+ at verbatim
+┌──────────┬─────────────────────┐          ┌───────────────────┬───────────┐
+│ demuxer  │                     │ packets  │                   │ muxer 0   │
+╞══════════╡ elementary stream 0 ├─────────⮞│elementary stream 0╞═══════════╡
+│          │                     │          │                   │OUTPUT0.mp4│
+│INPUT.mkv ├─────────────────────┤          └───────────────────┴───────────┘
+│          │                     │ packets  ┌───────────────────┬───────────┐
+│          │ elementary stream 1 ├─────────⮞│                   │ muxer 1   │
+│          │                     │          │elementary stream 0╞═══════════╡
+└──────────┴─────────────────────┘          │                   │OUTPUT1.mp4│
+                                            └───────────────────┴───────────┘
+ at end verbatim
+built with
+ at example
+ffmpeg -i INPUT.mkv -map 0:0 -c copy OUTPUT0.mp4 -map 0:1 -c copy OUTPUT1.mp4
+ at end example
+Note how a separate instance of the @option{-c} option is needed for every
+output file even though their values are the same. This is because non-global
+options (which is most of them) only apply in the context of the file before
+which they are placed.
+
+These  examples can of course be further generalized into arbitrary remappings
+of any number of inputs into any number of outputs.
+
+ at section Trancoding
+ at emph{Transcoding} is the process of decoding a stream and then encoding it
+again. Since encoding tends to be computationally expensive and in most cases
+degrades the stream quality (i.e. it is @emph{lossy}), you should only transcode
+when you need to and perform streamcopy otherwise. Typical reasons to transcode
+are:
+
+ at itemize
+ at item
+applying filters - e.g. resizing, deinterlacing, or overlaying video; resampling
+or mixing audio;
+
+ at item
+you want to feed the stream to something that cannot decode the original codec.
+ at end itemize
+Note that @command{ffmpeg} will transcode all audio, video, and subtitle streams
+unless you specify @option{-c copy} for them.
+
+Consider an example pipeline that reads an input file with one audio and one
+video stream, transcodes the video and copies the audio into a single output
+file. This can be schematically represented as follows
+ at verbatim
+┌──────────┬─────────────────────┐
+│ demuxer  │                     │       audio packets
+╞══════════╡ stream 0 (audio)    ├─────────────────────────────────────╮
+│          │                     │                                     │
+│INPUT.mkv ├─────────────────────┤ video    ┌─────────┐     raw        │
+│          │                     │ packets  │  video  │ video frames   │
+│          │ stream 1 (video)    ├─────────⮞│ decoder ├──────────────╮ │
+│          │                     │          │         │              │ │
+└──────────┴─────────────────────┘          └─────────┘              │ │
+                                                                     ▼ ▼
+                                                                     │ │
+┌──────────┬─────────────────────┐ video    ┌─────────┐              │ │
+│ muxer    │                     │ packets  │  video  │              │ │
+╞══════════╡ stream 0 (video)    │⮜─────────┤ encoder ├──────────────╯ │
+│          │                     │          │(libx264)│                │
+│OUTPUT.mp4├─────────────────────┤          └─────────┘                │
+│          │                     │                                     │
+│          │ stream 1 (audio)    │⮜────────────────────────────────────╯
+│          │                     │
+└──────────┴─────────────────────┘
+ at end verbatim
+and implemented with the following commandline:
+ at example
+ffmpeg -i INPUT.mkv -map 0:v -map 0:a -c:v libx264 -c:a copy OUTPUT.mp4
+ at end example
+Note how it uses stream specifiers @code{:v} and @code{:a} to select input
+streams and apply different values of the @option{-c} option to them; see the
+ at ref{Stream specifiers} section for more details.
 
-Encoded packets are then passed to the decoder (unless streamcopy is selected
-for the stream, see further for a description). The decoder produces
-uncompressed frames (raw video/PCM audio/...) which can be processed further by
-filtering (see next section). After filtering, the frames are passed to the
-encoder, which encodes them and outputs encoded packets. Finally, those are
-passed to the muxer, which writes the encoded packets to the output file.
 
 @section Filtering
-Before encoding, @command{ffmpeg} can process raw audio and video frames using
-filters from the libavfilter library. Several chained filters form a filter
-graph. @command{ffmpeg} distinguishes between two types of filtergraphs:
-simple and complex.
+
+When transcoding, audio and video streams can be filtered before encoding, with
+either a @emph{simple} or @emph{complex} filtergraph.
 
 @subsection Simple filtergraphs
+
 Simple filtergraphs are those that have exactly one input and output, both of
-the same type. In the above diagram they can be represented by simply inserting
-an additional step between decoding and encoding:
+the same type (audio or video). They are configured with the per-stream
+ at option{-filter} option (with @option{-vf} and @option{-af} aliases for
+ at option{-filter:v} (video) and @option{-filter:a} (audio) respectively). Note
+that simple filtergraphs are tied to their output stream, so e.g. if you have
+multiple audio streams, @option{-af} will create a separate filtergraph for each
+one.
 
+Taking the trancoding example from above, adding filtering (and omitting audio,
+for clarity) makes it look like this:
 @verbatim
- _________                        ______________
-|         |                      |              |
-| decoded |                      | encoded data |
-| frames  |\                   _ | packets      |
-|_________| \                  /||______________|
-             \   __________   /
-  simple     _\||          | /  encoder
-  filtergraph   | filtered |/
-                | frames   |
-                |__________|
-
+┌──────────┬───────────────┐
+│ demuxer  │               │          ┌─────────┐
+╞══════════╡ video stream  │ packets  │  video  │ frames
+│INPUT.mkv │               ├─────────⮞│ decoder ├─────⮞───╮
+│          │               │          └─────────┘         │
+└──────────┴───────────────┘                              │
+                                  ╭───────────⮜───────────╯
+                                  │   ┌────────────────────────┐
+                                  │   │  simple filtergraph    │
+                                  │   ╞════════════════════════╡
+                                  │   │  ┌───────┐  ┌───────┐  │
+                                  ╰──⮞├─⮞│ yadif ├─⮞│ scale ├─⮞├╮
+                                      │  └───────┘  └───────┘  ││
+                                      └────────────────────────┘│
+                                                                │
+                                                                │
+┌──────────┬───────────────┐ video    ┌─────────┐               │
+│ muxer    │               │ packets  │  video  │               │
+╞══════════╡ video stream  │⮜─────────┤ encoder ├───────⮜───────╯
+│OUTPUT.mp4│               │          │         │
+│          │               │          └─────────┘
+└──────────┴───────────────┘
 @end verbatim
 
-Simple filtergraphs are configured with the per-stream @option{-filter} option
-(with @option{-vf} and @option{-af} aliases for video and audio respectively).
-A simple filtergraph for video can look for example like this:
-
- at verbatim
- _______        _____________        _______        ________
-|       |      |             |      |       |      |        |
-| input | ---> | deinterlace | ---> | scale | ---> | output |
-|_______|      |_____________|      |_______|      |________|
-
- at end verbatim
-
-Note that some filters change frame properties but not frame contents. E.g. the
- at code{fps} filter in the example above changes number of frames, but does not
-touch the frame contents. Another example is the @code{setpts} filter, which
-only sets timestamps and otherwise passes the frames unchanged.
-
 @subsection Complex filtergraphs
+
 Complex filtergraphs are those which cannot be described as simply a linear
-processing chain applied to one stream. This is the case, for example, when the graph has
-more than one input and/or output, or when output stream type is different from
-input. They can be represented with the following diagram:
-
- at verbatim
- _________
-|         |
-| input 0 |\                    __________
-|_________| \                  |          |
-             \   _________    /| output 0 |
-              \ |         |  / |__________|
- _________     \| complex | /
-|         |     |         |/
-| input 1 |---->| filter  |\
-|_________|     |         | \   __________
-               /| graph   |  \ |          |
-              / |         |   \| output 1 |
- _________   /  |_________|    |__________|
-|         | /
-| input 2 |/
-|_________|
-
- at end verbatim
-
-Complex filtergraphs are configured with the @option{-filter_complex} option.
-Note that this option is global, since a complex filtergraph, by its nature,
-cannot be unambiguously associated with a single stream or file.
-
-The @option{-lavfi} option is equivalent to @option{-filter_complex}.
+processing chain applied to one stream. This is the case, for example, when the
+graph has more than one input and/or output, or when output stream type is
+different from input. Complex filtergraphs are configured with the
+ at option{-filter_complex} option. Note that this option is global, since a
+complex filtergraph, by its nature, cannot be unambiguously associated with a
+single stream or file. Each instance of @option{-filter_complex} creates a new
+complex filtergraph, and there can be any number of them.
 
 A trivial example of a complex filtergraph is the @code{overlay} filter, which
 has two video inputs and one video output, containing one video overlaid on top
 of the other. Its audio counterpart is the @code{amix} filter.
 
- at section Stream copy
-Stream copy is a mode selected by supplying the @code{copy} parameter to the
- at option{-codec} option. It makes @command{ffmpeg} omit the decoding and encoding
-step for the specified stream, so it does only demuxing and muxing. It is useful
-for changing the container format or modifying container-level metadata. The
-diagram above will, in this case, simplify to this:
-
- at verbatim
- _______              ______________            ________
-|       |            |              |          |        |
-| input |  demuxer   | encoded data |  muxer   | output |
-| file  | ---------> | packets      | -------> | file   |
-|_______|            |______________|          |________|
-
- at end verbatim
-
-Since there is no decoding or encoding, it is very fast and there is no quality
-loss. However, it might not work in some cases because of many factors. Applying
-filters is obviously also impossible, since filters work on uncompressed data.
-
+ at anchor{Loopback decoders}
 @section Loopback decoders
 While decoders are normally associated with demuxer streams, it is also possible
 to create "loopback" decoders that decode the output from some encoder and allow
-- 
2.43.0



More information about the ffmpeg-devel mailing list