[FFmpeg-devel] [PATCH] avformat/dv: fix timestamps of audio packets in case of dropped corrupt audio frames

Tue Nov 10 05:00:13 EET 2020

> On Nov 6, 2020, at 4:03 PM, Michael Niedermayer <michael at niedermayer.cc> wrote:
> 
> On Wed, Nov 04, 2020 at 10:44:56PM +0100, Marton Balint wrote:
>> 
>> On Wed, 4 Nov 2020, Michael Niedermayer wrote:
>> 
>>> we have "millisecond" based formats, rounded timestamps
>>> we have "exact" cases, maybe the timebase being 1 packet/frame per tick
>>> we have "high precission" where the timebase is so precisse it doesnt matter
>>> 
>>> This here though is a bit an oddball, the size if 1 PCM frame is 1 sample
>>> The timebase is not a millisecond based one, its not 1 frame either nor is
>>> it exact nor high precission.
>>> Its 1 video frame, and whatever amount of audio there is in the container
>>> 
>>> which IIUC can differ from 1 video frame even rounded.
>>> maybe this just doesnt occur and each frame has a count of samples always
>>> rounded to the closes integer count for the video frame.
>> 
>> The difference between the audio timestamp and the video timestamp for
>> packets from the same DV frame is at most 0.3929636797*frame_duration as the
>> specification says, as Dave quoted, so I don't see how the error can be
>> bigger than this.
>> 
>> It looks to me you are mixing timestamps coming from a demuxer, and
>> timestamps you calculate by counting the number of demuxed/decoded audio
>> samples or video frames. Synchronization is done using the former.
>> 
> 
>>> 
>>> But if for example some hardware was using internally a 16 sample buffer
>>> and only put multiplies of 16 samples in frames this would introduce a
>>> considerable amount of jitter in the timestamps in relation to the actual
>>> duration. And using async to fix this without introducing more problems
>>> might require some care.
>> 
>> I still don't see why timestamp or duration jitter is a problem 
> 
>> as long as
>> the error is below frame_duration/2. You can safely use async with
>> min_hard_comp set to frame_duration/2.
> 
> Thats exactly what i meant. an async like filter which behaves differently
> or async with a different value there can mess this up.
> IMHO such mess up is ok when the input is corrupted or invalid. OTOH
> here it is valid and correct data.
> 
>> In general, don't you find it problematic that the dv demuxer can return
>> different timestamps if you read packets sequentially and if you seek to the
>> end of a file? It looks like a huge bug
> 
> yes, this is not great
> but even with your patch you still have this effect
> when seeking to some point in time a player has to output video and
> audio to the user at an exact time and that will differ even with async
> from linear playbacks presentation

When trying to workaround the loss of audio sync, I use -skip_initial_bytes on the dv input to jump to the frame after a missing audio pack to read from that point to keep audio and video in sync from that offset in the bytestream (at least until the next missing audio source pack).

>> which is not fixable if you insist
>> on sample counting...
> 
> I think you misunderstood me, or maybe i didnt state my opinion well,
> iam not saying that i consider what dv in git does good. Rather that there
> is a problem beyond what these patches fix. 
> Some concept of timestamp accuracy independant of the distance of representable
> values would be usefull.
> if you take teh 1/25 or whatever they are based on dv timestamps and convert that
> to teh mpeg 90khz based ones thats not making it that accurate.
> OTOH if you take 1/25 based audio where each packet is 1/25sec worth of samples
> that very well might be sample accurate or even beyond.
> knowing this accuracy is usefull for configuring a async like filter or also in
> knowing how to deal with inconsistencies, is that timestamp jtter ? or the sample
> rate jittering / some droped samples ? 
> Its important to know as in one instance its the timestamps that need adjustment 
> while in the other the samples need adjustment
> ATM its down to the user to figure out on a file by file base how to deal or
> ignore this. Instead it should be possible for an automated system to 
> compensate such issues ...

As mentioned elsewhere, some automation (or at least a logged hint) would be helpful to add or suggest aresample=async=1 to fill the gaps when using containers that don’t support sparse audio. With Marton’s patch, the user has the opportunity to use that filter to keep the audio in sync.

[…]

Dave Rice