[FFmpeg-devel] [PATCH 1/2] doc/ffmpeg: rewrite the detailed description chapter
Anton Khirnov
anton at khirnov.net
Fri Oct 4 10:46:10 EEST 2024
Split it into sections that describe in detail
* the components of the transcoding pipeline
* the main features it handles, in order of complexity
* streamcopy
* transcoding
* filtering
Replace the current confusing/misleading diagrams with new ones that
actually reflect the program components and data flow between them.
---
doc/ffmpeg.texi | 491 +++++++++++++++++++++++++++++++++++++-----------
1 file changed, 378 insertions(+), 113 deletions(-)
diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index de140067ae..e17c17bcd7 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -87,140 +87,405 @@ The format option may be needed for raw input files.
@chapter Detailed description
@c man begin DETAILED DESCRIPTION
-The transcoding process in @command{ffmpeg} for each output can be described by
-the following diagram:
+ at command{ffmpeg} builds a transcoding pipeline out of the components listed
+below. The program's operation then consists of input data chunks flowing from
+the sources down the pipes towards the sinks, while being transformed by the
+components they encounter along the way.
+The following kinds of components are available:
+ at itemize
+ at item
+ at emph{Demuxers} (short for "demultiplexers") read an input source in order to
+extract
+
+ at itemize
+ at item
+global properties such as metadata or chapters;
+ at item
+list of input elementary streams and their properties
+ at end itemize
+
+One demuxer instance is created for each @option{-i} option, and sends encoded
+ at emph{packets} to @emph{decoders} or @emph{muxers}.
+
+In other literature, demuxers are sometimes called @emph{splitters}, because
+their main function is splitting a file into elementary streams (though some
+files only contain one elementary stream).
+
+A schematic representation of a demuxer looks like this:
@verbatim
- _______ ______________
-| | | |
-| input | demuxer | encoded data | decoder
-| file | ---------> | packets | -----+
-|_______| |______________| |
- v
- _________
- | |
- | decoded |
- | frames |
- |_________|
- ________ ______________ |
-| | | | |
-| output | <-------- | encoded data | <----+
-| file | muxer | packets | encoder
-|________| |______________|
-
-
+┌──────────┬───────────────────────┐
+│ demuxer │ │ packets for stream 0
+╞══════════╡ elementary stream 0 ├──────────────────────⮞
+│ │ │
+│ global ├───────────────────────┤
+│properties│ │ packets for stream 1
+│ and │ elementary stream 1 ├──────────────────────⮞
+│ metadata │ │
+│ ├───────────────────────┤
+│ │ │
+│ │ ........... │
+│ │ │
+│ ├───────────────────────┤
+│ │ │ packets for stream N
+│ │ elementary stream N ├──────────────────────⮞
+│ │ │
+└──────────┴───────────────────────┘
+ ⯅
+ │
+ │ read from file, network stream,
+ │ grabbing device, etc.
+ │
@end verbatim
- at command{ffmpeg} calls the libavformat library (containing demuxers) to read
-input files and get packets containing encoded data from them. When there are
-multiple input files, @command{ffmpeg} tries to keep them synchronized by
-tracking lowest timestamp on any active input stream.
+ at item
+ at emph{Decoders} receive encoded (compressed) @emph{packets} for an audio, video,
+or subtitle elementary stream, and decode them into raw @emph{frames} (arrays of
+pixels for video, PCM for audio). A decoder is typically associated with (and
+receives its input from) an elementary stream in a @emph{demuxer}, but sometimes
+may also exist on its own (see @ref{Loopback decoders}).
+
+A schematic representation of a decoder looks like this:
+ at verbatim
+ ┌─────────┐
+ packets │ │ raw frames
+─────────⮞│ decoder ├────────────⮞
+ │ │
+ └─────────┘
+ at end verbatim
+
+ at item
+ at emph{Filtergraphs} process and transform raw audio or video @emph{frames}. A
+filtergraph consists of one or more individual @emph{filters} linked into a
+graph. Filtergraphs come in two flavors - @emph{simple} and @emph{complex},
+configured with the @option{-filter} and @option{-filter_complex} options,
+respectively.
+
+A simple filtergraph is associated with an @emph{output elementary stream}; it
+receives the input to be filtered from a @emph{decoder} and sends filtered
+output to that output stream's @emph{encoder}.
+
+A simple video filtergraph that performs deinterlacing (using the @code{yadif}
+deinterlacer) followed by resizing (using the @code{scale} filter) can look like
+this:
+ at verbatim
+
+ ┌────────────────────────┐
+ │ simple filtergraph │
+ frames from ╞════════════════════════╡ frames for
+ a decoder │ ┌───────┐ ┌───────┐ │ an encoder
+────────────⮞├─⮞│ yadif ├─⮞│ scale ├─⮞│────────────⮞
+ │ └───────┘ └───────┘ │
+ └────────────────────────┘
+ at end verbatim
+
+A complex filtergraph is standalone and not associated with any specific stream.
+It may have multiple (or zero) inputs, potentially of different types (audio or
+video), each of which receiving data either from a decoder or another complex
+filtergraph's outputs. It also has one or more outputs that feed either an
+encoder or another complex filtergraph's input.
+
+The following example diagram represents a complex filtergraph with 3 inputs and
+2 outputs (all video):
+ at verbatim
+ ┌─────────────────────────────────────────────────┐
+ │ complex filtergraph │
+ ╞═════════════════════════════════════════════════╡
+ frames ├───────┐ ┌─────────┐ ┌─────────┐ ┌────────┤ frames
+─────────⮞│input 0├─⮞│ overlay ├─────⮞│ overlay ├─⮞│output 0├────────⮞
+ ├───────┘ │ │ │ │ └────────┤
+ frames ├───────┐╭⮞│ │ ╭⮞│ │ │
+─────────⮞│input 1├╯ └─────────┘ │ └─────────┘ │
+ ├───────┘ │ │
+ frames ├───────┐ ┌─────┐ ┌─────┬─╯ ┌────────┤ frames
+─────────⮞│input 2├⮞│scale├⮞│split├───────────────⮞│output 1├────────⮞
+ ├───────┘ └─────┘ └─────┘ └────────┤
+ └─────────────────────────────────────────────────┘
+ at end verbatim
+Frames from second input are overlaid over those from the first. Frames from the
+third input are rescaled, then the duplicated into two identical streams. One of
+them is overlaid over the combined first two inputs, with the result exposed as
+the filtergraph's first output. The other duplicate ends up being the
+filtergraph's second output.
+
+ at item
+ at emph{Encoders} receive raw audio, video, or subtitle @emph{frames} and encode
+them into encoded @emph{packets}. The encoding (compression) process is
+typically lossy - it degrades stream quality to make the output smaller; some
+encoders are @emph{lossless}, but at the cost of much higher output size. A
+video or audio encoder receives its input from some filtergraph's output,
+subtitle encoders receive input from a decoder (since subtitle filtering is not
+supported yet). Every encoder is associated with some muxer's @emph{output
+elementary stream} and sends its output to that muxer.
+
+A schematic representation of an encoder looks like this:
+ at verbatim
+ ┌─────────┐
+ raw frames │ │ packets
+────────────⮞│ encoder ├─────────⮞
+ │ │
+ └─────────┘
+ at end verbatim
+
+ at item
+ at emph{Muxers} (short for "multiplexers") receive encoded @emph{packets} for
+their elementary streams from encoders (the @emph{transcoding} path) or directly
+from demuxers (the @emph{streamcopy} path), interleave them (when there is more
+than one elementary stream), and write the resulting bytes into the output file
+(or pipe, network stream, etc.).
+
+A schematic representation of a muxer looks like this:
+ at verbatim
+ ┌──────────────────────┬───────────┐
+ packets for stream 0 │ │ muxer │
+──────────────────────⮞│ elementary stream 0 ╞═══════════╡
+ │ │ │
+ ├──────────────────────┤ global │
+ packets for stream 1 │ │properties │
+──────────────────────⮞│ elementary stream 1 │ and │
+ │ │ metadata │
+ ├──────────────────────┤ │
+ │ │ │
+ │ ........... │ │
+ │ │ │
+ ├──────────────────────┤ │
+ packets for stream N │ │ │
+──────────────────────⮞│ elementary stream N │ │
+ │ │ │
+ └──────────────────────┴─────┬─────┘
+ │
+ write to file, network stream, │
+ grabbing device, etc. │
+ │
+ ▼
+ at end verbatim
+
+ at end itemize
+
+ at section Streamcopy
+The simplest pipeline in @command{ffmpeg} is single-stream
+ at emph{streamcopy}, that is copying one @emph{input elementary stream}'s packets
+without decoding, filtering, or encoding them. As an example, consider an input
+file called @file{INPUT.mkv} with 3 elementary streams, from which we take the
+second and write it to file @file{OUTPUT.mp4}. A schematic representation of
+such a pipeline looks like this:
+ at verbatim
+┌──────────┬─────────────────────┐
+│ demuxer │ │ unused
+╞══════════╡ elementary stream 0 ├────────╳
+│ │ │
+│INPUT.mkv ├─────────────────────┤ ┌──────────────────────┬───────────┐
+│ │ │ packets │ │ muxer │
+│ │ elementary stream 1 ├─────────⮞│ elementary stream 0 ╞═══════════╡
+│ │ │ │ │OUTPUT.mp4 │
+│ ├─────────────────────┤ └──────────────────────┴───────────┘
+│ │ │ unused
+│ │ elementary stream 2 ├────────╳
+│ │ │
+└──────────┴─────────────────────┘
+ at end verbatim
+
+The above pipeline can be constructed with the following commandline:
+ at example
+ffmpeg -i INPUT.mkv -map 0:1 -c copy OUTPUT.mp4
+ at end example
+
+In this commandline
+ at itemize
+
+ at item
+there is a single input @file{INPUT.mkv};
+
+ at item
+there are no input options for this input;
+
+ at item
+there is a single output @file{OUTPUT.mp4};
+
+ at item
+there are two output options for this output:
+
+ at itemize
+ at item
+ at code{-map 0:1} selects the input stream to be used - from input with index 0
+(i.e. the first one) the stream with index 1 (i.e. the second one);
+
+ at item
+ at code{-c copy} selects the @code{copy} encoder, i.e. streamcopy with no decoding
+or encoding.
+ at end itemize
+
+ at end itemize
+
+Streamcopy is useful for changing the elementary stream count, container format,
+or modifying container-level metadata. Since there is no decoding or encoding,
+it is very fast and there is no quality loss. However, it might not work in some
+cases because of a variety of factors (e.g. certain information required by the
+target container is not available in the source). Applying filters is obviously
+also impossible, since filters work on decoded frames.
+
+More complex streamcopy scenarios can be constructed - e.g. combining streams
+from two input files into a single output:
+ at verbatim
+┌──────────┬────────────────────┐ ┌────────────────────┬───────────┐
+│ demuxer 0│ │ packets │ │ muxer │
+╞══════════╡elementary stream 0 ├────────⮞│elementary stream 0 ╞═══════════╡
+│INPUT0.mkv│ │ │ │OUTPUT.mp4 │
+└──────────┴────────────────────┘ ├────────────────────┤ │
+┌──────────┬────────────────────┐ │ │ │
+│ demuxer 1│ │ packets │elementary stream 1 │ │
+╞══════════╡elementary stream 0 ├────────⮞│ │ │
+│INPUT1.aac│ │ └────────────────────┴───────────┘
+└──────────┴────────────────────┘
+ at end verbatim
+that can be built by the commandline
+ at example
+ffmpeg -i INPUT0.mkv -i INPUT1.aac -map 0:0 -map 1:0 -c copy OUTPUT.mp4
+ at end example
+
+The output @option{-map} option is used twice here, creating two streams in the
+output file - one fed by the first input and one by the second. The single
+instance of the @option{-c} option selects streamcopy for both of those streams.
+You could also use multiple instances of this option together with
+ at ref{Stream specifiers} to apply different values to each stream, as will be
+demonstrated in following sections.
+
+A converse scenario is splitting multiple streams from a single input into
+multiple outputs:
+ at verbatim
+┌──────────┬─────────────────────┐ ┌───────────────────┬───────────┐
+│ demuxer │ │ packets │ │ muxer 0 │
+╞══════════╡ elementary stream 0 ├─────────⮞│elementary stream 0╞═══════════╡
+│ │ │ │ │OUTPUT0.mp4│
+│INPUT.mkv ├─────────────────────┤ └───────────────────┴───────────┘
+│ │ │ packets ┌───────────────────┬───────────┐
+│ │ elementary stream 1 ├─────────⮞│ │ muxer 1 │
+│ │ │ │elementary stream 0╞═══════════╡
+└──────────┴─────────────────────┘ │ │OUTPUT1.mp4│
+ └───────────────────┴───────────┘
+ at end verbatim
+built with
+ at example
+ffmpeg -i INPUT.mkv -map 0:0 -c copy OUTPUT0.mp4 -map 0:1 -c copy OUTPUT1.mp4
+ at end example
+Note how a separate instance of the @option{-c} option is needed for every
+output file even though their values are the same. This is because non-global
+options (which is most of them) only apply in the context of the file before
+which they are placed.
+
+These examples can of course be further generalized into arbitrary remappings
+of any number of inputs into any number of outputs.
+
+ at section Trancoding
+ at emph{Transcoding} is the process of decoding a stream and then encoding it
+again. Since encoding tends to be computationally expensive and in most cases
+degrades the stream quality (i.e. it is @emph{lossy}), you should only transcode
+when you need to and perform streamcopy otherwise. Typical reasons to transcode
+are:
+
+ at itemize
+ at item
+applying filters - e.g. resizing, deinterlacing, or overlaying video; resampling
+or mixing audio;
+
+ at item
+you want to feed the stream to something that cannot decode the original codec.
+ at end itemize
+Note that @command{ffmpeg} will transcode all audio, video, and subtitle streams
+unless you specify @option{-c copy} for them.
+
+Consider an example pipeline that reads an input file with one audio and one
+video stream, transcodes the video and copies the audio into a single output
+file. This can be schematically represented as follows
+ at verbatim
+┌──────────┬─────────────────────┐
+│ demuxer │ │ audio packets
+╞══════════╡ stream 0 (audio) ├─────────────────────────────────────╮
+│ │ │ │
+│INPUT.mkv ├─────────────────────┤ video ┌─────────┐ raw │
+│ │ │ packets │ video │ video frames │
+│ │ stream 1 (video) ├─────────⮞│ decoder ├──────────────╮ │
+│ │ │ │ │ │ │
+└──────────┴─────────────────────┘ └─────────┘ │ │
+ ▼ ▼
+ │ │
+┌──────────┬─────────────────────┐ video ┌─────────┐ │ │
+│ muxer │ │ packets │ video │ │ │
+╞══════════╡ stream 0 (video) │⮜─────────┤ encoder ├──────────────╯ │
+│ │ │ │(libx264)│ │
+│OUTPUT.mp4├─────────────────────┤ └─────────┘ │
+│ │ │ │
+│ │ stream 1 (audio) │⮜────────────────────────────────────╯
+│ │ │
+└──────────┴─────────────────────┘
+ at end verbatim
+and implemented with the following commandline:
+ at example
+ffmpeg -i INPUT.mkv -map 0:v -map 0:a -c:v libx264 -c:a copy OUTPUT.mp4
+ at end example
+Note how it uses stream specifiers @code{:v} and @code{:a} to select input
+streams and apply different values of the @option{-c} option to them; see the
+ at ref{Stream specifiers} section for more details.
-Encoded packets are then passed to the decoder (unless streamcopy is selected
-for the stream, see further for a description). The decoder produces
-uncompressed frames (raw video/PCM audio/...) which can be processed further by
-filtering (see next section). After filtering, the frames are passed to the
-encoder, which encodes them and outputs encoded packets. Finally, those are
-passed to the muxer, which writes the encoded packets to the output file.
@section Filtering
-Before encoding, @command{ffmpeg} can process raw audio and video frames using
-filters from the libavfilter library. Several chained filters form a filter
-graph. @command{ffmpeg} distinguishes between two types of filtergraphs:
-simple and complex.
+
+When transcoding, audio and video streams can be filtered before encoding, with
+either a @emph{simple} or @emph{complex} filtergraph.
@subsection Simple filtergraphs
+
Simple filtergraphs are those that have exactly one input and output, both of
-the same type. In the above diagram they can be represented by simply inserting
-an additional step between decoding and encoding:
+the same type (audio or video). They are configured with the per-stream
+ at option{-filter} option (with @option{-vf} and @option{-af} aliases for
+ at option{-filter:v} (video) and @option{-filter:a} (audio) respectively). Note
+that simple filtergraphs are tied to their output stream, so e.g. if you have
+multiple audio streams, @option{-af} will create a separate filtergraph for each
+one.
+Taking the trancoding example from above, adding filtering (and omitting audio,
+for clarity) makes it look like this:
@verbatim
- _________ ______________
-| | | |
-| decoded | | encoded data |
-| frames |\ _ | packets |
-|_________| \ /||______________|
- \ __________ /
- simple _\|| | / encoder
- filtergraph | filtered |/
- | frames |
- |__________|
-
+┌──────────┬───────────────┐
+│ demuxer │ │ ┌─────────┐
+╞══════════╡ video stream │ packets │ video │ frames
+│INPUT.mkv │ ├─────────⮞│ decoder ├─────⮞───╮
+│ │ │ └─────────┘ │
+└──────────┴───────────────┘ │
+ ╭───────────⮜───────────╯
+ │ ┌────────────────────────┐
+ │ │ simple filtergraph │
+ │ ╞════════════════════════╡
+ │ │ ┌───────┐ ┌───────┐ │
+ ╰──⮞├─⮞│ yadif ├─⮞│ scale ├─⮞├╮
+ │ └───────┘ └───────┘ ││
+ └────────────────────────┘│
+ │
+ │
+┌──────────┬───────────────┐ video ┌─────────┐ │
+│ muxer │ │ packets │ video │ │
+╞══════════╡ video stream │⮜─────────┤ encoder ├───────⮜───────╯
+│OUTPUT.mp4│ │ │ │
+│ │ │ └─────────┘
+└──────────┴───────────────┘
@end verbatim
-Simple filtergraphs are configured with the per-stream @option{-filter} option
-(with @option{-vf} and @option{-af} aliases for video and audio respectively).
-A simple filtergraph for video can look for example like this:
-
- at verbatim
- _______ _____________ _______ ________
-| | | | | | | |
-| input | ---> | deinterlace | ---> | scale | ---> | output |
-|_______| |_____________| |_______| |________|
-
- at end verbatim
-
-Note that some filters change frame properties but not frame contents. E.g. the
- at code{fps} filter in the example above changes number of frames, but does not
-touch the frame contents. Another example is the @code{setpts} filter, which
-only sets timestamps and otherwise passes the frames unchanged.
-
@subsection Complex filtergraphs
+
Complex filtergraphs are those which cannot be described as simply a linear
-processing chain applied to one stream. This is the case, for example, when the graph has
-more than one input and/or output, or when output stream type is different from
-input. They can be represented with the following diagram:
-
- at verbatim
- _________
-| |
-| input 0 |\ __________
-|_________| \ | |
- \ _________ /| output 0 |
- \ | | / |__________|
- _________ \| complex | /
-| | | |/
-| input 1 |---->| filter |\
-|_________| | | \ __________
- /| graph | \ | |
- / | | \| output 1 |
- _________ / |_________| |__________|
-| | /
-| input 2 |/
-|_________|
-
- at end verbatim
-
-Complex filtergraphs are configured with the @option{-filter_complex} option.
-Note that this option is global, since a complex filtergraph, by its nature,
-cannot be unambiguously associated with a single stream or file.
-
-The @option{-lavfi} option is equivalent to @option{-filter_complex}.
+processing chain applied to one stream. This is the case, for example, when the
+graph has more than one input and/or output, or when output stream type is
+different from input. Complex filtergraphs are configured with the
+ at option{-filter_complex} option. Note that this option is global, since a
+complex filtergraph, by its nature, cannot be unambiguously associated with a
+single stream or file. Each instance of @option{-filter_complex} creates a new
+complex filtergraph, and there can be any number of them.
A trivial example of a complex filtergraph is the @code{overlay} filter, which
has two video inputs and one video output, containing one video overlaid on top
of the other. Its audio counterpart is the @code{amix} filter.
- at section Stream copy
-Stream copy is a mode selected by supplying the @code{copy} parameter to the
- at option{-codec} option. It makes @command{ffmpeg} omit the decoding and encoding
-step for the specified stream, so it does only demuxing and muxing. It is useful
-for changing the container format or modifying container-level metadata. The
-diagram above will, in this case, simplify to this:
-
- at verbatim
- _______ ______________ ________
-| | | | | |
-| input | demuxer | encoded data | muxer | output |
-| file | ---------> | packets | -------> | file |
-|_______| |______________| |________|
-
- at end verbatim
-
-Since there is no decoding or encoding, it is very fast and there is no quality
-loss. However, it might not work in some cases because of many factors. Applying
-filters is obviously also impossible, since filters work on uncompressed data.
-
+ at anchor{Loopback decoders}
@section Loopback decoders
While decoders are normally associated with demuxer streams, it is also possible
to create "loopback" decoders that decode the output from some encoder and allow
--
2.43.0
More information about the ffmpeg-devel
mailing list