[FFmpeg-devel] [PATCH] First shot at AVCHD seeking via new seeking API

Mon Aug 3 04:35:25 CEST 2009

Hi Ivan,

On 08/02/2009 10:46 AM, Ivan Schreter wrote:
>>>>> [...]
>>>>>
>>>>> +/**
>>>>> + * Helper structure describing keyframe search state of one stream.
>>>>> + */
>>>>> +typedef struct {
>>>>> + int active; ///< Set to 1 for active streams, 0 for inactive.
>>>>
>>>> I don't think inactive stream need a structure, so this field is
>>>> unneeded.
>>> Either I need an "active" flag, or I need to know index of the stream to
>>> which this struct refers. But when searching for associated structure
>>> from read packet, it's much easier to use stream_index of the packet to
>>> find the appropriate structure in O(1) than looking for it via O(n)
>>> loop.
>>>
>>> Or did you mean something different?
>> > [...]
>>
>> You can check for st->discard.
>> You can also save the memory of AVSyncPoint struct for discarded
>> streams and keep O(1) lookup.
>>
>> And please don't use MAX_STREAMS, in the future we need to support
>> arbitrary number of streams.
> I've changed it to use an allocated array of structures, as big as
> needed for nb_streams. But for now I'm going to keep active flag, since
> something must be used to detect whether the stream is relevant or not.
>
> I suppose you thought about allocating array of pointers and then
> allocating helper structure only for relevant streams. Then, it's
> possible to check against NULL, instead of using active flag. I can
> rework it this way if it's preferred, but I don't see much benefit.

I think you can check for st->discard and therefore don't need active flag.

 >>>> [...]
>>>>
>>>> Do term_pos and first_pos need to be per stream ?
>>> Yes. Reason as above - frames for various streams interleaved in
>>> packets, so the starting position of frames in various streams don't
>>> necessarily correlate with their arrival via av_read_frame().
>>
>> Scenario is currently for first iteration:
>> seek at pos
>> read frames until you find hi keyframes for all streams.
>> The Lowest pos is the first keyframe found. It may not be in order but
>> that doesn't matter at the first iteration.
>>
>> On next iterations you have no interest in going past the first pos
>> since read_frame will and if not must return the same frames in the
>> same order at the same pos, whatever interleaving there is. If it
>> doesn't problem is somewhere else and must be fixed.
>>
>> So you only need one first_pos for all the streams.
>> And next iterations will find keyframes before pos. That's why I don't
>> think you need term_pos nor terminated either.
> Unfortunately, no. Again, consider this case:
>
> pos - 1: 1st PES packet of video frame
> pos + 0: PES packet containing audio frame
> pos + 1: 2nd PES packet of video frame
>
> now, we start reading from somewhere before pos. read_frame will first
> return audio frame at pos, where we'd stop reading.

For the first iteration we continue reading until the hi keyframe is 
found, if I understood correctly.

> But the video frame,  which begins at pos - 1 would not be read, since  > we need to continue  reading to read the frame with position pos - 1.
 > This is no bug, but a feature...

Yes, however if lo keyframe is not found, code will seek back before pos 
again to find the lo keyframe for video.
And in this second iteration you don't need to read past the previous pos.

 >>>> [...]
>>>>
>>>> If I understood correctly, at first iteration you should find high
>>>> keyframes for all streams.
>>> Not necessarily. Normally, yes, but there is a special case, with key
>>> frame in a stream beginning before rough position and ending after it.
>>> The frame won't be returned in the first iteration, only possibly a
>>> later key frame. It will only be found in later iteration (or not found
>>> at all, if seeking against end of the file).
>>
>> You are describing the case of the first keyframe here right ?
>> It will be returned at the next iteration so it's fine and lo will be
>> updated.
> I'm describing a case where:
>
> pos - 1: 1st PES packet of key frame of stream 1
> pos + 0: 1st and only PES packet of key frame of stream 0
> pos + 1: 2nd and last PES packet of key frame of stream 1
>
> Generic seeking routine might return position of key frame of stream 0,
> since it's PTS matches the best. But actually, position pos - 1 is the
> correct one, since otherwise the decoder would not synchronize correctly
> at requested timestamp.

Yes, yo have to seek back before pos - 1 to find the lo keyframe, that 
will be done during the second iteration anyway.

 >> [...]
>>
>> Why do you need pos _and_ pts per stream ? What is pos for ?
> Well, I need pos in order to find actual position where to reposition
> the file pointer...

Ok.

 >>>> [...]
>>>>
>>>> You can shortcut by checking if user specified a max_pos > ts or
>>>> min_pos < ts. In this case only found_lo or found_hi matters.
>>> I'm not sure.
>>>
>>> The max_pos/min_pos/max_ts/min_ts don't say "don't give me anything
>>> outside this range", rather "make sure my decoder output synchronizes in
>>> this range". I.e., it's perfectly OK to return a position before
>>> min_pos/min_ts.
>>
>> I think, given the parameters name, it exactly says don't give me
>> anything outside this range, that's why the API has been extended.
>> So it's not ok to return anything outside this range.
> According to the API docs:
>
> "Seeking will be done so that the point from which all active streams
> can be presented successfully will be closest to ts and within min/max_ts."
>
> It says, "the point from which all active streams can be presented
> successfully", i.e., where they synchronize or put differently, where
> each of them encounters a key frame, must be within min/max_ts, NOT the
> point where we reposition the file. Therefore, data _before_ min_ts must
> be taken into account and therefore the current code is correct.

Yes, this is what is in the docs, I said you could shortcut if ts == 
max_ts or ts == min_ts.

I think it's more: first correctly decoded (except is AVSEEK_FLAG_ANY is 
given) frame pts or pos will be within the range.

Explicitly, if you have a keyframe > ts and max_ts == ts or keyframe < 
min_ts and ts == min_ts it's not ok.

Users wants accurate seeking.

 > [...]
 >
> +    int64_t term_pos;   ///<  Termination position, where we already read packets.
> +    int64_t first_pos;  ///<  First packet position in this iteration.
> +    int     terminated; ///<  Termination flag.

Can you try to remove these ?

 > [...]
 >
> +
> +    for (;;) {
> +        fpos = url_ftell(s->pb);
> +        if (av_read_frame(s,&pkt)<  0) {
> +            // EOF or error, make sure high flags are set
> +            for (idx = 0; idx<  s->nb_streams; ++idx) {
> +                sp =&sync[idx];
> +                if (sp->active) {

	s->streams[idx]->discard

> +                    if (!sp->found_hi) {
> +                        // No high frame exists for this stream
> +                        sp->found_hi = 1;
> +                        (*found_hi)++;
> +                        sp->pts_hi = INT64_MAX;
> +                        sp->pos_hi = INT64_MAX;

Why do you set pts and pos ?
Can't you just check if pts_hi is set or not ?

> +                    }
> +                }
> +            }
> +            break;
> +        }
> +        idx = pkt.stream_index;
> +        sp =&sync[idx];
> +
> +        if (!sp->active) {

s->streams[idx]->discard

 > [...]
 >
> +        dts = pkt.dts;
> +        if (pts == AV_NOPTS_VALUE) {
> +            // Some formats don't provide PTS, only DTS.
> +            pts = dts;
> +        }

This is wrong if we have delay.

[...]

-- 
Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
FFmpeg maintainer                                  http://www.ffmpeg.org