[FFmpeg-devel] [PATCH] avformat/matroskaenc: Allow customizing the time stamp precision via option

Wed May 19 20:42:55 EEST 2021

On 5/19/2021 1:09 PM, Lynne wrote:
> May 19, 2021, 17:23 by jamrial at gmail.com:
> 
>> On 5/19/2021 12:14 PM, Lynne wrote:
>>
>>> May 19, 2021, 17:12 by dev at lynne.ee:
>>>
>>>> May 19, 2021, 03:44 by michael.dirks at xaymar.com:
>>>>
>>>>> From: Michael Fabian 'Xaymar' Dirks <michael.dirks at xaymar.com>
>>>>>
>>>>> Adds the "timestamp_precision" option to matroskaenc, which allows users
>>>>> to increase the time stamp precision up to 1 nanosecond. This aids with
>>>>> certain demuxers being unable to detect constant rate. It is a work
>>>>> around fix for the problems reported in 259, 6406, 7927, 8909 and 9124.
>>>>>
>>>>> Signed-off-by: Michael Fabian 'Xaymar' Dirks <michael.dirks at xaymar.com>
>>>>> ---
>>>>>   libavformat/matroskaenc.c | 25 ++++++++++++++++++-------
>>>>>   1 file changed, 18 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
>>>>> index b4284a8778..8e2813c36e 100644
>>>>> --- a/libavformat/matroskaenc.c
>>>>> +++ b/libavformat/matroskaenc.c
>>>>> @@ -158,6 +158,8 @@ typedef struct MatroskaMuxContext {
>>>>>   int                 default_mode;
>>>>>   uint32_t            segment_uid[4];
>>>>> +
>>>>> +    AVRational          time_base;
>>>>>   } MatroskaMuxContext;
>>>>>   /** 2 bytes * 7 for EBML IDs, 7 1-byte EBML lengths, 6 1-byte uint,
>>>>> @@ -1816,6 +1818,7 @@ static int mkv_write_header(AVFormatContext *s)
>>>>>   const AVDictionaryEntry *tag;
>>>>>   int ret, i, version = 2;
>>>>>   int64_t creation_time;
>>>>> +    int64_t time_base = 1;
>>>>>   if (mkv->mode != MODE_WEBM ||
>>>>>   av_dict_get(s->metadata, "stereo_mode", NULL, 0) ||
>>>>> @@ -1852,7 +1855,10 @@ static int mkv_write_header(AVFormatContext *s)
>>>>>   return ret;
>>>>>   pb = mkv->info.bc;
>>>>>   -    put_ebml_uint(pb, MATROSKA_ID_TIMECODESCALE, 1000000);
>>>>> +    time_base = av_rescale_q(time_base, mkv->time_base, (AVRational){1, 1000000000});
>>>>> +    av_log(s, AV_LOG_DEBUG, "TimestampScale is: %" PRId64 " ns\n", time_base);
>>>>> +    put_ebml_uint(pb, MATROSKA_ID_TIMECODESCALE, time_base);
>>>>> +
>>>>>   if ((tag = av_dict_get(s->metadata, "title", NULL, 0)))
>>>>>   put_ebml_string(pb, MATROSKA_ID_TITLE, tag->value);
>>>>>   if (!(s->flags & AVFMT_FLAG_BITEXACT)) {
>>>>> @@ -1885,11 +1891,11 @@ static int mkv_write_header(AVFormatContext *s)
>>>>>   int64_t metadata_duration = get_metadata_duration(s);
>>>>>   if (s->duration > 0) {
>>>>> -            int64_t scaledDuration = av_rescale(s->duration, 1000, AV_TIME_BASE);
>>>>> +            int64_t scaledDuration = av_rescale_q(s->duration, AV_TIME_BASE_Q, mkv->time_base);
>>>>>   put_ebml_float(pb, MATROSKA_ID_DURATION, scaledDuration);
>>>>>   av_log(s, AV_LOG_DEBUG, "Write early duration from recording time = %" PRIu64 "\n", scaledDuration);
>>>>>   } else if (metadata_duration > 0) {
>>>>> -            int64_t scaledDuration = av_rescale(metadata_duration, 1000, AV_TIME_BASE);
>>>>> +            int64_t scaledDuration = av_rescale_q(metadata_duration, AV_TIME_BASE_Q, mkv->time_base);
>>>>>   put_ebml_float(pb, MATROSKA_ID_DURATION, scaledDuration);
>>>>>   av_log(s, AV_LOG_DEBUG, "Write early duration from metadata = %" PRIu64 "\n", scaledDuration);
>>>>>   } else if (s->pb->seekable & AVIO_SEEKABLE_NORMAL) {
>>>>> @@ -1950,12 +1956,12 @@ static int mkv_write_header(AVFormatContext *s)
>>>>>   // after 4k and on a keyframe
>>>>>   if (IS_SEEKABLE(pb, mkv)) {
>>>>>   if (mkv->cluster_time_limit < 0)
>>>>> -            mkv->cluster_time_limit = 5000;
>>>>> +            mkv->cluster_time_limit = av_rescale_q(5000, (AVRational){1, 1000}, mkv->time_base);
>>>>>   if (mkv->cluster_size_limit < 0)
>>>>>   mkv->cluster_size_limit = 5 * 1024 * 1024;
>>>>>   } else {
>>>>>   if (mkv->cluster_time_limit < 0)
>>>>> -            mkv->cluster_time_limit = 1000;
>>>>> +            mkv->cluster_time_limit = av_rescale_q(1000, (AVRational){1, 1000}, mkv->time_base);
>>>>>   if (mkv->cluster_size_limit < 0)
>>>>>   mkv->cluster_size_limit = 32 * 1024;
>>>>>   }
>>>>> @@ -2736,8 +2742,8 @@ static int mkv_init(struct AVFormatContext *s)
>>>>>   track->uid = mkv_get_uid(mkv->tracks, i, &c);
>>>>>   }
>>>>>   -        // ms precision is the de-facto standard timescale for mkv files
>>>>> -        avpriv_set_pts_info(st, 64, 1, 1000);
>>>>> +        // Use user-defined timescale.
>>>>> +        avpriv_set_pts_info(st, 64, mkv->time_base.num, mkv->time_base.den);
>>>>>   if (st->codecpar->codec_type == AVMEDIA_TYPE_ATTACHMENT) {
>>>>>   if (mkv->mode == MODE_WEBM) {
>>>>> @@ -2757,6 +2763,10 @@ static int mkv_init(struct AVFormatContext *s)
>>>>>   track->track_num_size = ebml_num_size(track->track_num);
>>>>>   }
>>>>>   +    // Scale the configured cluster_time_limit.
>>>>> +    if (mkv->cluster_time_limit >= 0)
>>>>> +        mkv->cluster_time_limit = av_rescale_q(mkv->cluster_time_limit, (AVRational){1, 1000}, mkv->time_base);
>>>>> +
>>>>>   if (mkv->is_dash && nb_tracks != 1)
>>>>>   return AVERROR(EINVAL);
>>>>>   @@ -2824,6 +2834,7 @@ static const AVOption options[] = {
>>>>>   { "infer", "For each track type, mark the first track of disposition default as default; if none exists, mark the first track as default.", 0, AV_OPT_TYPE_CONST, { .i64 = DEFAULT_MODE_INFER }, 0, 0, FLAGS, "default_mode" },
>>>>>   { "infer_no_subs", "For each track type, mark the first track of disposition default as default; for audio and video: if none exists, mark the first track as default.", 0, AV_OPT_TYPE_CONST, { .i64 = DEFAULT_MODE_INFER_NO_SUBS }, 0, 0, FLAGS, "default_mode" },
>>>>>   { "passthrough", "Use the disposition flag as-is", 0, AV_OPT_TYPE_CONST, { .i64 = DEFAULT_MODE_PASSTHROUGH }, 0, 0, FLAGS, "default_mode" },
>>>>> +    { "timestamp_precision", "Timestamp precision to use for the entire container", OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl = 0.001 }, 0.000000001, 1.0, FLAGS},
>>>>>   { NULL },
>>>>>   };
>>>>>
>>>>
>>>> Make the default 1 ns.
>>>> This is 2021. I'm not accepting any compromises. Matroska has
>>>> dragged its heels for long enough, video players have resorted to hacks
>>>> to fix time stamps, and high-framerate video is still unacceptably jittery.
>>>>
>>>> We're bumping to 5.0 next release anyway, and if any users have issues,
>>>> they can manually set this to 1ms.
>>>>
>>>
>>> Or better yet, use the timebase to decide the value.
>>> That way, remuxing Matroska won't break anything, and the overhead
>>> will be lower.
>>>
>>
>> Unlike mp4, there's no per-track timebase in Matroska. You can't feasibly derive a timebase for the whole thing from one of them.
>> And no, we absolutely must not change the default from the recommended/defacto value defined by the spec. I worry about the amount of software out there that may not expect anything else. It's going to give us more headaches than benefits.
>>
> 
> It's going to give incentive for both implementations and the spec to change.
> It'll help out in the long run and keep Matroska alive, rather than on life support.
> The spec authors have not done any work on what has been the most important
> shortcoming of their work, despite promising over and over again for 6 years to
> come up with a reasonable, backwards compatible solution. It's not even their
> fault - just awful implementations copy pasting something the spec once mentioned
> in passing and taking it as a fact.

I don't worry about software implementations that can always make a new 
release with bugfixes. I worry about awful implementations that may be 
the fire-and-forget kind in shitty TVs and no-brand embedded devices 
that will never see any kind of firmware update.

> 
> 
>> Those who have issues with timestamps and know what they are doing can set timebase to whatever they need. But people doing "ffmpeg -i input output" because they read a tutorial online should not start finding their encodes potentially not playing on their TVs or set-top boxes.
>>
> 
> We'd have a setting for this reason.

Yes, for people that know it exists, and how to use it. The kind of user 
i mentioned above is not it, and we should not potentially break 
playback of their files. Defaults should be sane, follow the spec, and 
known to work. We simply don't know how the many existing 
implementations will behave with 1ns or other values.
Also, as mentioned earlier, the block timestamp relative to the start of 
the cluster is stored as an signed 16bit value, so timebases like 1ns 
are going to result in a lot of overheard from making one cluster per 
block. That's just not a good default.

movenc recently added an option similar to this one, without changing 
the default. These are advanced options, and should be used only when 
you know what you're doing.