[FFmpeg-devel] [PATCH] fftools/ffmpeg: allow forcing input framerate on streamcopy

Fri Feb 14 11:20:24 EET 2020

On Tue, Feb 11, 2020 at 6:05 PM Paul B Mahol <onemda at gmail.com> wrote:

> From: Leo Izen <leo.izen at gmail.com>
>
> Signed-off-by: Paul B Mahol <onemda at gmail.com>
> ---
>

The OP has been asked to improve the commit message [1] and the same
applies to you. E.g. I did not initially know that this was supposed to
apply to video only.

>  fftools/ffmpeg.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
> index b0bffe0a54..9bfa853253 100644
> --- a/fftools/ffmpeg.c
> +++ b/fftools/ffmpeg.c
> @@ -2045,10 +2045,12 @@ static void do_streamcopy(InputStream *ist,
> OutputStream *ost, const AVPacket *p
>      if (av_packet_ref(&opkt, pkt) < 0)
>          exit_program(1);
>
> -    if (pkt->pts != AV_NOPTS_VALUE)
> +    if (ist->framerate.num)
> +        opkt.pts = av_rescale_q(ist->pts, AV_TIME_BASE_Q,
> ost->mux_timebase) - ost_tb_start_time;
> +    else if (pkt->pts != AV_NOPTS_VALUE)
>          opkt.pts = av_rescale_q(pkt->pts, ist->st->time_base,
> ost->mux_timebase) - ost_tb_start_time;
>
> -    if (pkt->dts == AV_NOPTS_VALUE)
> +    if (pkt->dts == AV_NOPTS_VALUE || ist->framerate.num)
>          opkt.dts = av_rescale_q(ist->dts, AV_TIME_BASE_Q,
> ost->mux_timebase);
>      else
>          opkt.dts = av_rescale_q(pkt->dts, ist->st->time_base,
> ost->mux_timebase);
> @@ -2590,7 +2592,7 @@ static int process_input_packet(InputStream *ist,
> const AVPacket *pkt, int no_eo
>          avpkt = *pkt;
>      }
>
> -    if (pkt && pkt->dts != AV_NOPTS_VALUE) {
> +    if (pkt && pkt->dts != AV_NOPTS_VALUE && !ist->framerate.num) {
>          ist->next_dts = ist->dts = av_rescale_q(pkt->dts,
> ist->st->time_base, AV_TIME_BASE_Q);
>          if (ist->dec_ctx->codec_type != AVMEDIA_TYPE_VIDEO ||
> !ist->decoding_needed)
>              ist->next_pts = ist->pts = ist->dts;
> @@ -3146,8 +3148,14 @@ static int
> init_output_stream_streamcopy(OutputStream *ost)
>          else
>              sar = par_src->sample_aspect_ratio;
>          ost->st->sample_aspect_ratio = par_dst->sample_aspect_ratio = sar;
> -        ost->st->avg_frame_rate = ist->st->avg_frame_rate;
> -        ost->st->r_frame_rate = ist->st->r_frame_rate;
> +
> +        if (ist->framerate.num) {
> +            ost->st->avg_frame_rate = ist->framerate;
> +            ost->st->r_frame_rate = ist->framerate;
> +        } else {
> +            ost->st->avg_frame_rate = ist->st->avg_frame_rate;
> +            ost->st->r_frame_rate = ist->st->r_frame_rate;
> +        }
>

This could be better aligned.

>          break;
>      }
>

Generally, this doesn't work as expected. The most serious flaw is that the
created pts are simply not reordered. Original file:

#tb 0: 1/1000 (so the stream is 50fps)
0,        -40,          0,       20,   201020, 5ef3fbbc
0,        -20,         80,       20,    79258, 2738de70
0,          0,         40,       20,    27268, 887d2c5f
0,         20,         20,       20,     2243, 79d47408
0,         40,         60,       20,      744, 6a0b3a95
0,         60,        160,       20,    94582, 7d8fc434
0,         80,        120,       20,    35017, b9116d81
0,        100,        100,       20,     3046, 5922a16f
0,        120,        140,       20,     1888, b8c56629
0,        140,        240,       20,    93439, 96494831
0,        160,        200,       20,    32545, 01c93da0
0,        180,        180,       20,     1819, f8a1e391
0,        200,        220,       20,     1875, f536397a
0,        220,        320,       20,    91395, f5fb5adf
0,        240,        280,       20,    32321, 9a8def28
0,        260,        260,       20,     1663, 057d9518
0,        280,        300,       20,     1943, 30b376fa
0,        300,        400,       20,    92881, 8fef50e1
0,        320,        360,       20,    32350, 0c08ceed
0,        340,        340,       20,     1648, b1b33864
0,        360,        380,       20,     2047, 3de28d48
0,        380,        480,       20,    90021, 96b24f3b
0,        400,        440,       20,    31372, 4a9411ec
0,        420,        420,       20,     1706, 11cc3687
0,        440,        460,       20,     2022, a19a3246

With your patch and -r 50 (which is already the framerate) as input option
this becomes:

#tb 0: 1/50
0,         -2,         -2,        1,   201020, 5ef3fbbc
0,         -1,         -1,        1,    79258, 2738de70
0,          0,          0,        1,    27268, 887d2c5f
0,          1,          1,        1,     2243, 79d47408
0,          2,          2,        1,      744, 6a0b3a95
0,          3,          3,        1,    94582, 7d8fc434
0,          4,          4,        1,    35017, b9116d81
0,          5,          5,        1,     3046, 5922a16f
0,          6,          6,        1,     1888, b8c56629
0,          7,          7,        1,    93439, 96494831
0,          8,          8,        1,    32545, 01c93da0
0,          9,          9,        1,     1819, f8a1e391
0,         10,         10,        1,     1875, f536397a
0,         11,         11,        1,    91395, f5fb5adf
0,         12,         12,        1,    32321, 9a8def28
0,         13,         13,        1,     1663, 057d9518
0,         14,         14,        1,     1943, 30b376fa
0,         15,         15,        1,    92881, 8fef50e1
0,         16,         16,        1,    32350, 0c08ceed
0,         17,         17,        1,     1648, b1b33864
0,         18,         18,        1,     2047, 3de28d48
0,         19,         19,        1,    90021, 96b24f3b
0,         20,         20,        1,    31372, 4a9411ec
0,         21,         21,        1,     1706, 11cc3687
0,         22,         22,        1,     2022, a19a3246

Using -r 50, if I streamcopy the video and decode the video to a different
output, I get lots of warnings:

[framehash @ 0x55d147036700] Invalid DTS: 1 PTS: 0 in output stream 0:0,
replacing by guess
[framehash @ 0x55d147036700] Invalid DTS: 2 PTS: 0 in output stream 0:0,
replacing by guess
[framehash @ 0x55d147036700] Invalid DTS: 3 PTS: 0 in output stream 0:0,
replacing by guess
[framehash @ 0x55d147036700] Invalid DTS: 5 PTS: 1 in output stream 0:0,
replacing by guess
[framehash @ 0x55d147036700] Invalid DTS: 6 PTS: 1 in output stream 0:0,
replacing by guess
[framehash @ 0x55d147036700] Invalid DTS: 7 PTS: 1 in output stream 0:0,
replacing by guess
...

These warnings come from the streamcopy's muxer (in this case framehash).

If I use a vfr input file (created from the above file via mkvmerge with a
timestamp_v2 file), the durations are off. Input:

#tb 0: 1/1000
0,        -40,          0,       20,   201020, 5ef3fbbc
0,        -20,         80,      100,    79258, 2738de70
0,          0,         40,       20,    27268, 887d2c5f
0,         20,         20,       20,     2243, 79d47408
0,         40,         60,       20,      744, 6a0b3a95
0,         60,        240,       20,    94582, 7d8fc434
0,         80,        200,       20,    35017, b9116d81
0,        180,        180,       20,     3046, 5922a16f
0,        200,        220,       20,     1888, b8c56629
0,        220,        320,      420,    93439, 96494831
0,        240,        280,       20,    32545, 01c93da0
0,        260,        260,       20,     1819, f8a1e391
0,        280,        300,       20,     1875, f536397a
0,        300,        800,       20,    91395, f5fb5adf
0,        320,        760,       20,    32321, 9a8def28
0,        740,        740,       20,     1663, 057d9518
0,        760,        780,       20,     1943, 30b376fa
0,        780,       1220,       20,    92881, 8fef50e1
0,        800,       1180,       20,    32350, 0c08ceed
0,        820,        820,      360,     1648, b1b33864
0,       1180,       1200,       20,     2047, 3de28d48
0,       1200,       1300,       20,    90021, 96b24f3b
0,       1220,       1260,       20,    31372, 4a9411ec
0,       1240,       1240,       20,     1706, 11cc3687
0,       1260,       1280,       20,     2022, a19a3246

With -r 50 this becomes:

#tb 0: 1/50
0,         -2,         -2,        1,   201020, 5ef3fbbc
0,         -1,         -1,        5,    79258, 2738de70
0,          0,          0,        1,    27268, 887d2c5f
0,          1,          1,        1,     2243, 79d47408
0,          2,          2,        1,      744, 6a0b3a95
0,          3,          3,        1,    94582, 7d8fc434
0,          4,          4,        1,    35017, b9116d81
0,          5,          5,        1,     3046, 5922a16f
0,          6,          6,        1,     1888, b8c56629
0,          7,          7,       21,    93439, 96494831
0,          8,          8,        1,    32545, 01c93da0
0,          9,          9,        1,     1819, f8a1e391
0,         10,         10,        1,     1875, f536397a
0,         11,         11,        1,    91395, f5fb5adf
0,         12,         12,        1,    32321, 9a8def28
0,         13,         13,        1,     1663, 057d9518
0,         14,         14,        1,     1943, 30b376fa
0,         15,         15,        1,    92881, 8fef50e1
0,         16,         16,        1,    32350, 0c08ceed
0,         17,         17,       18,     1648, b1b33864
0,         18,         18,        1,     2047, 3de28d48
0,         19,         19,        1,    90021, 96b24f3b
0,         20,         20,        1,    31372, 4a9411ec
0,         21,         21,        1,     1706, 11cc3687
0,         22,         22,        1,     2022, a19a3246

So the duration has been simply converted based upon the timebases
involved, although I would have expected the duration to be always 1.

Fixing the duration is probably the easiest of the lot; the frame
reordering seems the most difficult, as you'd need to peek into the future
and buffer a few packets. IMO, this functionality should go into a
bitstream filter, where the buffering can be done internally by the filter;
this would also allow API users to use this.

Furthermore, there is another complication that exists regardless of
whether this is in a bitstream filter or in ffmpeg.c: For some codecs, a
packet can sometimes contain a coded field or a coded frame and this can
change in the same stream. Simply making the duration of every packet the
same is probably not what is desired in this scenario, as the parts coded
as coded fields would be twice as long. Maybe one can add this information
to the packet's flags? It could be populated with the parser. (This could
also be used to fix bug #6810.)

- Andreas

[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-April/242205.html
PS: The patch does currently not apply any longer (because of
13dc90396d6d8eb70c583b94fb2978ed5a3f417c),
but fixing this is easy.