[FFmpeg-devel] [PATCH]?avcodec_decode_audio3?and?multiple?frames in a packet
Sascha Sommer
saschasommer
Thu Sep 17 13:37:10 CEST 2009
Hi,
On Mittwoch, 16. September 2009, Michael Niedermayer wrote:
> On Wed, Sep 16, 2009 at 09:24:24PM +0200, Sascha Sommer wrote:
> > Hi,
> >
> > On Mittwoch, 16. September 2009, Michael Niedermayer wrote:
> > > On Wed, Sep 16, 2009 at 05:51:03PM +0200, Sascha Sommer wrote:
> > > > Hi,
> > > >
> > > > On Mittwoch, 16. September 2009, Michael Niedermayer wrote:
> > > > > On Wed, Sep 16, 2009 at 02:52:29PM +0200, Sascha Sommer wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On Samstag, 12. September 2009, Justin Ruggles wrote:
> > > > > > > Michael Niedermayer wrote:
> > > > > > > > On Fri, Sep 11, 2009 at 12:13:02PM +0200, Sascha Sommer
> > > > > > > > wrote: [...]
> > > > > > > >
> > > > > > > >> Previously a value of 0 meant that no frame was decoded.
> > > > > > > >
> > > > > > > > no
> > > > > > > > you read the docs backward, it says if no frame was decoded
> > > > > > > > return 0 it does not say that 0 means no frames have been
> > > > > > > > decoded, it could equally well mean 0 bytes used
> > > > > > >
> > > > > > > Ah, good. So, although the current text is technically correct
> > > > > > > if interpreted that way, it is ambiguous. Why do we need to
> > > > > > > have a 0 return value also possibly mean no frames have been
> > > > > > > decoded? If frame_size_ptr is set to 0, that always means no
> > > > > > > frames have been decoded, without regard to the return value.
> > > > > > > And a return value of 0 should mean zero bytes were used,
> > > > > > > without regard to what frame_size_ptr is set to. They seem
> > > > > > > mutually exclusive to me...
> > > > > >
> > > > > > I agree. The return value controls the number of input bytes,
> > > > > > frame_size_ptr the number of output bytes. I don't see why 0
> > > > > > needs to be returned when no frame was outputted.
> > > > >
> > > > > what exactly did the decoder then do with the data?
> > > > > and what was that data it did not decode?
> > > >
> > > > It maybe skipped it? For example when the packet contained only DSE
> > > > syntax elements in AAC. I did not check the spec if this can happen
> > > > or for what these Data Stream Elements are actually used but as we
> > > > already found out ffmpeg does
> > > > ? ? ? ? ? ? ? ? avpkt.data += ret;
> > > > ? ? ? ? ? ? ? ? avpkt.size -= ret;
> > > > So this will decode always the same data.
> > >
> > > yes
> > > a packet that is input to a decoder MUST in general contain 1 frame.
> >
> > 1 frame that can be decoded by the decoder, right?
> > But can we under all circumstances know what the decoder produces out of
> > this 1 input frame?
>
> no it can have an error and return a negative value representing the type
> of error
Maybe we are talking about different things. Let me try to explain again where
I have a problem with understanding the current version of
avcodec_decode_audio3:
avcodec_decode_audio3 says the following:
* @return On error a negative value is returned, otherwise the number of
bytes
* used or zero if no frame could be decompressed.
How is that written in C (probably the wrong interpretation)?
if(error) //On error a negative value is returned
return error;
if(number_of_bytes_used) //otherwise the number of bytes used
return number_of_bytes_used;
if(!*data_size) //or zero if no frame could be decompressed
return 0;
... //what to return here?
I think this can be reached with CODEC_CAP_DELAY when avpkt->size == 0
So there should also be return 0
So this looks to me the same as
if(error)
return error;
if(number_of_bytes_used)
return number_of_bytes_used;
and the "or zero if no frame could be decompressed." can be removed from the
API description. When number_of_bytes_used is 0 return 0.
However, judging from the discussion, I think the API description has to be
seen as.
if(error)
return error;
if(!*data_size)
return 0;
return number_of_bytes_used.
This would mean that 0 is always returned when there is no frame, no matter
what the value of number_of_bytes_used is.
ffmpeg.c contains the following code:
while (avpkt.size > 0 || (!pkt && ist->next_pts != ist->pts)) {
...
ret = avcodec_decode_audio3(ist->st->codec, samples,
&data_size,
&avpkt);
if (ret < 0)
goto fail_decode;
avpkt.data += ret;
avpkt.size -= ret;
/* Some bug in mpeg audio decoder gives */
/* data_size < 0, it seems they are overflows */
if (data_size <= 0) {
/* no audio frame */
continue;
}
...
So when there is no frame (ret = 0, data_size <= 0) ffmpeg will call
avcodec_decode_audio3 again with the same avpkt without incrementing
avpkt.data and without decreasing avpkt.size.
Now it depends on the decoder how it handles that.
I think there could be an endless loop when the packet is interpreted the same
way as during the first call or there might be a bitstream error when the
decoder expected a follow-up packet.
If I understand you correctly ("a packet that is input to a decoder MUST in
general contain 1 frame" and "my definition of frame contains a positive
number of samples, not 0 and not
negative"), the previously described case is not allowed to happen.
So why is this not a real error then?
I always assumed this could happen and this is why I thought that the API not
only needs to be clarified but also should be changed so that it always
returns the number of consumed bytes when there was no error to stay
consistent with the case when a frame is returned and to have an unambiguous
indicator of how many bytes have been used from the packet. When "no bytes
used from the input packet" is the same as "bytes used from the input packet
but no frame returned", the data_size variable needs to be checked to decide
if the next packet is to be decoded or the same packet is to be decoded
again.
Otherwise it would have been enough to check if all bytes from a packet have
been consumed or if an error occurred.
>
> > > Now, there are formats that use inseperable frames intermingled where
> > > a decoder hs to be feeded with more than 1 frame.
> > > packets that contain no frame at all would have a duration of 0, would
> > > have dts equal to the next packet would _not_ have a pts because they
> > > are not presented, practically no container could store them ...
> >
> > Still there could be a more or less "raw" format X that contains only the
> > information that it contains an audio frame of format Y with len Z bytes.
> > Audio format Y could have variable samples per frame, allowing even 0
> > samples
>
> We also could consider changing the API once these X-Y-Z turns up in
> reality besides my definition of frame contains a positive number of
> samples, not 0 and not negative
>
Maybe frame was not even the right name in the description of X-Y-Z.
What comes out of the demuxer are packets. And my "definition" of a packet is
simply a block of data. Such blocks of data can become something else when
there is a decoder that knows how to understand the packet.
With wmapro and other codecs one such packet can become multiple frames and I
assumed that also the other direction is possible: Sometimes such a packet
may not contain enough data for a frame without that condition being an
error.
I can't name real formats similar to X-Y-Z where frames merely change the
state of the decoder and do not result in output samples.
I checked the AAC spec and a packet always has to result in samples so my
example with a packet consisting of pure DSE elements does not happen.
I was unable to create a wmapro stream that had a first packet that contained
one and a half compressed frames (for wmapro the imdct output for the first
frame is discarded so this would cause such a no samples case)
I was unable to seek to a wmapro packet that did not contain at least one
complete frame so maybe this really is not allowed in wmapro then.
However changing avcodec_decode_audio3 to:
ret = avctx->codec->decode(avctx, samples, frame_size_ptr, avpkt);
avctx->frame_number++;
if (*frame_size_ptr == 0) {
return 0;
}
seems to result in an endless loop for
http://samples.mplayerhq.hu/A-codecs/lossless/luckynight.flac
So at least ffmpeg.c would need to be changed then and there should a better
description on how to handle the return values of avcodec_decode_audio3. This
is not clear from the description.
> > (these frames could for example contain some decode tables for the next
> > frames)
> > As format X is meant to be generic it does not know what audio format Y
> > stores in its frames and can't concatenate the frames so that always one
> > frame is outputted.
> >
> > > please dont change the API to support this, at least not without first
> > > explaining me how it could work on the muxer side or even how the
> > > demuxer side should deal with all the special cases for timestamps, it
> > > surely does not currently
> >
> > I think when the format is decoded, we only need to think about what
> > comes out of the decoder and here, when a new sample is outputted, the
> > sum of all previously outputted samples can be used to calculate the pts.
>
> not in reality, no sound recording device has a perfect clock, the sample
> rate is not exact nor is it perfectly constant.
> you might be lucky of course that its good enough or that it has been
> resampled before muxing but if not your sum of samples can differ from the
> correct pts.
> of course its good enough for a single packet but possibly not a whole file
Yes but either we have pts stored in the container and we can use it or we
don't have pts and then we have to calculate it otherwise. And if the
inventor of X-Y-Z thinks that pts is to be calculated from the number of
outputted samples, it should not matter if the decoder sometimes returns
without a frame as long as all returned frames have the correct pts.
> > Input packets
> > with duration 0 do not need to result in an output packet, or how are
> > they currently handled?
>
> like a bug in the demuxer, besides duration==0 means unknown duration in
> the code not zero duration.
>
I did not know that.
> > Muxing the compressed data is another story but please let's figure out
> > how to do the decoding first.
>
> both muxing and decoding work in ffmpeg since a few years, a change to the
> API will have to keep both functioning. So what is it that you meant?
>
> (yes i do feel a little upset about the API change discussion prior to
> ANY exlpanation why the current would be worse than the proposed)
> I would really prefer if you first would describe a _real_ situation in
> which the current is insufficent.
>
It was not my intention to upset you, sorry. I think that the description of
avcodec_decode_audio3 is unclear and by trying to understand it
I had the impression that there are use cases that are not handled well by the
current API. However it looks like they only need to be handled differently
in ffmpeg.c and the API can stay as is.
Regards
Sascha
More information about the ffmpeg-devel
mailing list