[FFmpeg-devel] [PATCH v2 2/2] ffmpeg: add option -isync

Mon Jul 4 11:20:22 EEST 2022

On 2022-07-04 11:51 am, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-07-02 11:51:53)
>>
>> On 2022-07-02 02:12 pm, Anton Khirnov wrote:
>>> Quoting Gyan Doshi (2022-07-01 13:03:04)
>>>> On 2022-07-01 03:33 pm, Anton Khirnov wrote:
>>>>> Quoting Gyan Doshi (2022-06-25 10:29:51)
>>>>>> This is a per-file input option that adjusts an input's timestamps
>>>>>> with reference to another input, so that emitted packet timestamps
>>>>>> account for the difference between the start times of the two inputs.
>>>>>>
>>>>>> Typical use case is to sync two or more live inputs such as from capture
>>>>>> devices. Both the target and reference input source timestamps should be
>>>>>> based on the same clock source.
>>>>> If both streams are using the same clock, then why is any extra
>>>>> synchronization needed?
>>>> Because ffmpeg.c normalizes timestamps by default. We can keep
>>>> timestamps using -copyts, but these inputs are usually preprocessed
>>>> using single-input filters which won't have access to the reference
>>>> inputs,
>>> No idea what you mean by "reference inputs" here.
>> The reference input is the one the target is being synced against. e.g.
>> in a karaoke session -  the music track from a DAW would be ref and the
>> user's voice via mic is the target.
>>
>>>> or the merge filters like e.g. amix don't sync by timestamp.
>>> amix does seem to look at timestamps.
>> amix does not *sync* by timestamp. If one input starts at 4 and the
>> other at 7, the 2nd isn't aligned by timestamp.
> So maybe it should?
>
> My concern generally with this patchset is that it seems like you're
> changing things where it's easier to do rather than where it's correct.

There are many multi=input filters which may be used. amix is just one 
example.

The basic 'deficiency' here is that filters operate upon frames and only 
look at single frames for the most part, even though frames are part of 
streams. These streams may have companion streams (which may be part of 
programs) which are part of a single input. These inputs may have 
companion inputs.  Anything in this tree may be relevant for a 
particular operation as a reference, e.g. we have a bespoke filter 
scale2ref so that we can look at another stream's frames. But we don't 
have pad2ref, crop2ref ..etc.  So, the absolutely correct thing to do 
would be to supply a global context to processing modules like 
filtergraphs , maybe an array of dicts, containing attributes of all 
inputs like starting time stamps, resolution, string metadata..etc. That 
would obviate need for these bespoke fields and even filters.

But that's a much larger design undertaking and I'm just addressing one 
specific practical need here. This patch is currently being used 
successfully by commercial users in a private build. Many users have 
posted to ffmpeg-users and popular forums over the years asking for 
something that achieves this.

Actually, this functionality sounds like it sort of existed earlier in 
the form of map sync (i.e. -map 1:a,0:a:1). Although the assignment 
syntax still remains (and doesn't warn/error out),  it's a no-op now 
since the application code was removed in 2012 by Michael, who said he 
based it off an idea from one of your commits, presumably in Libav.

Regards,
Gyan