[FFmpeg-devel] [PATCH] avformat/dv: fix timestamps of audio packets in case of dropped corrupt audio frames

Sat Feb 20 19:20:34 EET 2021

Hi,

> On Oct 31, 2020, at 5:15 PM, Marton Balint <cus at passwd.hu <mailto:cus at passwd.hu>> wrote:
> On Sat, 31 Oct 2020, Dave Rice wrote:
>>> On Oct 31, 2020, at 3:47 PM, Marton Balint <cus at passwd.hu <mailto:cus at passwd.hu>> wrote:
>>> On Sat, 31 Oct 2020, Dave Rice wrote:
>>>> Hi Marton,
>>>>> On Oct 31, 2020, at 12:56 PM, Marton Balint <cus at passwd.hu <mailto:cus at passwd.hu>> wrote:
>>>>> Fixes out of sync timestamps in ticket #8762.
>>>> Although Michael’s recent patch does address the issue documented in 8762, I haven’t found this patch to fix the issue. I tried with -c:a copy and with -c:a pcm_s16le with some sample files that exhibit this issue but each output was out of sync. I put an output at https://gist.github.com/dericed/659bd843bd38b6f24a60198b5e345795 <https://gist.github.com/dericed/659bd843bd38b6f24a60198b5e345795>. That output notes that 3597 packages of video are read and 3586 packets of audio. In the resulting file, at the end of the timeline the audio is 9 frames out of sync and my output video stream is 00:02:00.020 and output audio stream is 00:01:59.653.
>>>> Beyond copying or encoding the audio, are there other options I should use to test this?
>>> Well, it depends on what you want. After this patch you should get a file which has audio packets synced to video, but the audio stream is sparse, not every video packet has a corresponding audio packet. (It looks like our MOV muxer does not support muxing of sparse audio therefore does not produce proper timestamps. But MKV does, please try that.)
>>> You can also make ffmpeg generate the missing audio based on packet timestamps. Swresample has an async=1 option, so something like this should get you synced audio with continous audio packets:
>>> ffmpeg -y -i 1670520000_12.dv -c:v copy \
>>> -af aresample=async=1:min_hard_comp=0.01 -c:a pcm_s16le 1670520000_12.mov
>> 
>> Thank you for this. With the patch and async, the result is synced and the resulting audio was the same as Michael’s patch.
>> 
>> Could you explain why you used min_hard_comp here? IIUC min_hard_comp is a set a threshold between the strategies of trim/fill or stretch/squeeze to align the audio to time; however, the async documentation says "Setting this to 1 will enable filling and trimming, larger values represent the maximum amount in samples that the data may be stretched or squeezed” so I thought that async=1 would not permit stretch/squeeze anyway.
> 
> It is documented poorly, but if you check the source code you will see that async=1 implicitly sets min_comp to 0.001 enabling trimming/dropping. min_hard_comp decides the threshold when silence injection actually happens, and the default for that is 0.1, which is more than a frame, therefore not acceptable if we want to maintain <1 frame accuracy. Or at least that is how I think it should work.

I’ve found that aresample=async=1:min_hard_comp=0.01, as discussed here, works well to add audio samples to maintain timestamp accuracy when muxing into a format like mov. However, this approach doesn’t work if the sparseness of the audio stream is at the end of the stream. Is there a way to use min_hard_comp to consider differences between a timestamp and audio data when one of the ends of that range is the end of the file?
Best Regards,
Dave Rice