[FFmpeg-devel] [PATCH] avcodec: Vorbis decode: don't use a flag to determine if frames have been output

Mon Oct 17 11:23:11 EEST 2022

On Mon, Oct 17, 2022 at 10:18 AM Paul B Mahol <onemda at gmail.com> wrote:
>
> On 10/17/22, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
> > On Thu, Sep 8, 2022 at 10:26 AM <jyrkive at nekonyansoft.com> wrote:
> >>
> >> From: Jyrki Vesterinen <jyrkive at nekonyansoft.com>
> >>
> >> If a developer using FFmpeg libraries seeks into an earlier position and
> >> calls
> >> avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will
> >> drop
> >> the next frame, since buffer flushing clears the first_frame flag. As a
> >> result,
> >> the audio samples the calling code receives may be ahead of the requested
> >> seek
> >> position, which is unacceptable in some use cases such as playing a
> >> looping
> >> sound effect.
> >>
> >> This commit removes the first_frame flag entirely and instead uses the
> >> presentation timestamp to determine if it's the first frame.
> >> ---
> >>  libavcodec/vorbisdec.c | 5 +----
> >>  1 file changed, 1 insertion(+), 4 deletions(-)
> >>
> >> diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c
> >> index 4d03947c49..d4b030d7b9 100644
> >> --- a/libavcodec/vorbisdec.c
> >> +++ b/libavcodec/vorbisdec.c
> >> @@ -130,7 +130,6 @@ typedef struct vorbis_context_s {
> >>      AVFloatDSPContext *fdsp;
> >>
> >>      FFTContext mdct[2];
> >> -    uint8_t       first_frame;
> >>      uint32_t      version;
> >>      uint8_t       audio_channels;
> >>      uint32_t      audio_samplerate;
> >> @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext
> >> *avctx, AVFrame *frame,
> >>      if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0)
> >>          return len;
> >>
> >> -    if (!vc->first_frame) {
> >> -        vc->first_frame = 1;
> >> +    if (frame->pts < 0) {
> >>          *got_frame_ptr = 0;
> >>          av_frame_unref(frame);
> >>          return buf_size;
> >> @@ -1881,7 +1879,6 @@ static av_cold void
> >> vorbis_decode_flush(AVCodecContext *avctx)
> >>                               sizeof(*vc->saved));
> >>      }
> >>      vc->previous_window = -1;
> >> -    vc->first_frame = 0;
> >>  }
> >>
> >>  const FFCodec ff_vorbis_decoder = {
> >> --
> >> 2.37.2.windows.2
> >>
> >
> > This change seems to be rather fragile and faulty, causing vorbis
> > decoding to fail in various scenarios for a bunch of downstream
> > projects.
> >
> > - A user may not set pts at all, resulting in all frames being dropped
> > (pure audio files don't necessarily need timestamps)
> > - A seek could happen before any frame is ever decoded, resulting in
> > wrong drops, potentially in the middle of playback if the user seeks
> > backwards after opening in the middle.
> >
> > In general, using timestamps to control decoder behavior is often just
> > wrong, as timestamps are not reliable, and most importantly, not tied
> > to the bitstream at all.
> >
> > Can we revert this and re-think the approach?
>
> Are you saying that previous solution was better than current one?
>
> By your own words its ever worse that current state.
>

At least the old solution consistently just dropped one frame after a
flush, not in the middle of playback, or dropping every single frame
because the user did not specify timestamps, breaking playback
entirely.

We already have mechanisms to properly drop padding data from the
front of a stream in generic code, that should ideally be used, and
not a decoder-specific hack.

- Hendrik