[FFmpeg-devel] [PATCH] First shot at AVCHD seeking via new seeking API

Sun Jul 26 15:00:16 CEST 2009

Hi Baptiste,

first of all, thanks for the review.

I am also trying to put this searching routine to work for MPEG-PS, so 
I'll have to change the code a bit (hopefully only two things). I'll 
explain in the comments below.

Baptiste Coudurier wrote:
> Hi Ivan,
>
> On 07/25/2009 04:14 AM, Ivan Schreter wrote:
>> Hi,
>>
>> [...]
>> If noone objects, I'll commit them during the next week.
>>
>
> Since it is only used in mpegts for now, please put all the code in 
> mpegts.c, these symbols do not need to be exported.
Yes and no. As I wrote above, I'm trying to put this to work for MPEG-PS 
as well. I've seen that MPEG-PS uses index API, but it seems not to work 
correctly for seeking anyway. The same method used for seeking in 
MPEG-TS can be also used in MPEG-PS (without an index). Other formats 
will probably profit from it as well.

So either it lands in utils.c (which is IMHO the proper place) or I'll 
create a new seek_utils.c, which will contain the routine (and 
optionally move other seek-relevant code from utils.c into the new 
file). Or eventually in some common MPEG source file (which one?). What 
do you suggest?

>
> Also, I think state must not be trashed if seek failed, that is 
> av_read_frame_flush must not be called before seeking.
This is hard to achieve. In order to find keyframes, I have to start 
reading the frames via the parser. This trashes the state. Only then I 
know whether the seek succeeds or not. Maybe it would be possible to 
save the state and restore it on error. I'll look into it.

>
> > [...]
>>
>> +/**
>> + * Helper structure describing keyframe search state of one stream.
>> + */
>> +typedef struct {
>> +    int active;         ///<  Set to 1 for active streams, 0 for 
>> inactive.
>
> I don't think inactive stream need a structure, so this field is 
> unneeded.
Either I need an "active" flag, or I need to know index of the stream to 
which this struct refers. But when searching for associated structure 
from read packet, it's much easier to use stream_index of the packet to 
find the appropriate structure in O(1) than looking for it via O(n) loop.

Or did you mean something different?

>
>> +    int64_t epsilon;    ///<  Rounding error based on time base.
>
> Can you explain why it is needed ?
Timestamps are scaled back and forth between stream time base (of one or 
more streams with possibly different time bases) and AV_TIME_BASE, also 
depending on how the API is called. This is to account for rounding 
errors between time base of the stream and AV_TIME_BASE.

>
> > [...]
> >
>> +
>> +/**
>> + * Partial search for keyframe in multiple streams.
>> + *
>> + * This routine searches for the next lower and next higher 
>> timestamp to
>> + * given target timestamp in each stream, starting at current file 
>> position
>> + * and ending at position, where all streams have already been examined
>> + * (or when all higher key frames found in first iteration).
>> + *
>> + * This routine is called iteratively with exponential backoff to 
>> find lower
>> + * timestamp.
>> + *
>> + * @param s                 format context.
>> + * @param timestamp         target timestamp (or position, if 
>> AVSEEK_FLAG_BYTE).
>> + * @param flags             seeking flags.
>> + * @param sync              array with information per stream.
>> + * @param keyframes_to_find count of keyframes to find in total.
>> + * @param found_lo          pointer to count of already found low 
>> timestamp keyframes.
>> + * @param found_hi          pointer to count of already found high 
>> timestamp keyframes.
>> + */
>> +static void search_keyframe_multi_partial(AVFormatContext *s,
>> +                                          int64_t timestamp,
>> +                                          int flags,
>> +                                          AVSyncPoint *sync,
>> +                                          int keyframes_to_find,
>> +                                          int *found_lo,
>> +                                          int *found_hi)
>
> I don't think the name is adequate. I'm thinking of a better one.
Yeah, the name is not particularly good, so if you have a better idea, 
it's welcome.

>
> [...]
>> +            // Use file position as fallback, if no position 
>> returned. This is
>> +            // inexact in most formats, so format parser should 
>> always fill
>> +            // proper position to make seeking routine return 
>> reliable results.
>> +            pos = fpos;
>> +        }
>
> Actually I think fpos is more reliable in our situation :)
Actually, it is not. As you know, in MPEG, a frame is distributed over 
several packets interleaved with packets of frames of other streams. It 
is returned from the parser, after the last packet of the frame is read. 
Thus, fpos is unreliable, since it's the position of the last packet of 
the frame, but we need the first one. It's just a fallback, which may or 
may not work for a particular format. It wouldn't for MPEG.

>
>> +        if (sp->first_pos == AV_NOPTS_VALUE) {
>> +            // Note down termination position for the next iteration 
>> - when
>> +            // we encounter a packet with the same position, we will 
>> ignore
>> +            // any further packets for this stream in next iteration 
>> (as they
>> +            // are already evaluated).
>> +            sp->first_pos = pos;
>> +        }
>> +
>> +        if (sp->term_pos != AV_NOPTS_VALUE&&  pos>  sp->term_pos) {
>> +            // We are past the end position from last iteration, 
>> ignore packet.
>> +            if (!sp->terminated) {
>> +                sp->terminated = 1;
>> +                ++terminated_count;
>> +                if (terminated_count == keyframes_to_find)
>> +                    break;  // all terminated, iteration done
>> +            }
>> +            continue;
>> +        }
>
> Do term_pos and first_pos need to be per stream ?
Yes. Reason as above - frames for various streams interleaved in 
packets, so the starting position of frames in various streams don't 
necessarily correlate with their arrival via av_read_frame().
>
>> +        // Evaluate key frames with known TS (or any frames, if 
>> AVSEEK_FLAG_ANY set).
>> +        if (pts != AV_NOPTS_VALUE&&  ((flg&  PKT_FLAG_KEY) || 
>> (flags&  AVSEEK_FLAG_ANY))) {
>> +            // Rescale stream timestamp to AV_TIME_BASE.
>> +            st = s->streams[idx];
>> +            pts_tb = av_rescale(pts, AV_TIME_BASE * 
>> (int64_t)st->time_base.num, st->time_base.den);
>> +            dts_tb = av_rescale(dts, AV_TIME_BASE * 
>> (int64_t)st->time_base.num, st->time_base.den);
>> +
>> +            if (flags&  AVSEEK_FLAG_BYTE) {
>> +                // Seek by byte position.
>> +                if (pos<= timestamp) {
>> +                    // Keyframe found before target position.
>> +                    if (!sp->found_lo) {
>> +                        // Found first keyframe lower than target 
>> position.
>> +                        sp->found_lo = 1;
>> +                        (*found_lo)++;
>> +                        sp->pts_lo = pts_tb;
>> +                        sp->dts_lo = dts_tb;
>> +                        sp->pos_lo = pos;
>> +                    } else if (sp->pos_lo<  pos) {
>> +                        // Found a better match (closer to target 
>> position).
>> +                        sp->pts_lo = pts_tb;
>> +                        sp->dts_lo = dts_tb;
>> +                        sp->pos_lo = pos;
>> +                    }
>> +                }
>> +                if (pos>= timestamp) {
>> +                    // Keyframe found after target position.
>> +                    if (!sp->found_hi) {
>> +                        // Found first keyframe higher than target 
>> position.
>> +                        sp->found_hi = 1;
>> +                        (*found_hi)++;
>> +                        sp->pts_hi = pts_tb;
>> +                        sp->dts_hi = dts_tb;
>> +                        sp->pos_hi = pos;
>> +                        if (*found_hi>= keyframes_to_find&&  
>> sp->term_pos == INT64_MAX) {
>> +                            // We found high frame for all. They may 
>> get updated
>> +                            // to TS closer to target TS in later 
>> iterations (which
>> +                            // will stop at start position of 
>> previous iteration).
>> +                            break;
>> +                        }
>> +                    } else if (sp->pos_hi>  pos) {
>> +                        // Found a better match (actually, shouldn't 
>> happen).
>> +                        sp->pts_hi = pts_tb;
>> +                        sp->dts_hi = dts_tb;
>> +                        sp->pos_hi = pos;
>> +                    }
>> +                }
>> +            } else {
>> +                // Seek by timestamp.
>> +                if (pts_tb<= timestamp + sp->epsilon) {
>> +                    // Keyframe found before target timestamp.
>> +                    if (!sp->found_lo) {
>> +                        // Found first keyframe lower than target 
>> timestamp.
>> +                        sp->found_lo = 1;
>> +                        (*found_lo)++;
>> +                        sp->pts_lo = pts_tb;
>> +                        sp->dts_lo = dts_tb;
>> +                        sp->pos_lo = pos;
>> +                    } else if (sp->pts_lo<  pts_tb) {
>> +                        // Found a better match (closer to target 
>> timestamp).
>> +                        sp->pts_lo = pts_tb;
>> +                        sp->dts_lo = dts_tb;
>> +                        sp->pos_lo = pos;
>> +                    }
>> +                }
>> +                if (pts_tb + sp->epsilon>= timestamp) {
>> +                    // Keyframe found after target timestamp.
>> +                    if (!sp->found_hi) {
>> +                        // Found first keyframe higher than target 
>> timestamp.
>> +                        sp->found_hi = 1;
>> +                        (*found_hi)++;
>> +                        sp->pts_hi = pts_tb;
>> +                        sp->dts_hi = dts_tb;
>> +                        sp->pos_hi = pos;
>> +                        if (*found_hi>= keyframes_to_find&&  
>> sp->term_pos == INT64_MAX) {
>> +                            // We found high frame for all. They may 
>> get updated
>> +                            // to TS closer to target TS in later 
>> iterations (which
>> +                            // will stop at start position of 
>> previous iteration).
>> +                            break;
>> +                        }
>
> If I understood correctly, at first iteration you should find high 
> keyframes for all streams.
Not necessarily. Normally, yes, but there is a special case, with key 
frame in a stream beginning before rough position and ending after it. 
The frame won't be returned in the first iteration, only possibly a 
later key frame. It will only be found in later iteration (or not found 
at all, if seeking against end of the file).

>
> If SEEK_BYTE is requested and the pos you are at is > max_ts (max 
> position) or term_pos was earlier at the previous iteration, then you 
> can break earlier after read_frame anyway.
>
> Do I miss something ?
Again, interleaved frames in MPEG. This makes the stuff somewhat complex.

>
>> +                    } else if (sp->pts_hi>  pts_tb) {
>> +                        // Found a better match (actually, shouldn't 
>> happen).
>> +                        sp->pts_hi = pts_tb;
>> +                        sp->dts_hi = dts_tb;
>> +                        sp->pos_hi = pos;
>> +                    }
>
> Duplicate code between SEEK_BYTE and timestamps. I believe you only 
> need one hi and one lo per stream (will be set to either pos or 
> timestamp), you don't need both pts and dts. Please explain why if 
> that's not possible.
In case of SEEK_BYTE, I only need pos. In normal case, I need pos and 
pts. As for dts, I'm going to solve this differently (see later), since 
current DTS update code is inexact, which makes problems with MPEG-PS 
streams.
>
>> +            }
>> +        }
>> +    }
>> +
>> +    // Clean up the parser.
>> +    av_read_frame_flush(s);
>> +}
>> +
>> +int64_t ff_gen_syncpoint_search(AVFormatContext *s,
>> +                                int stream_index,
>> +                                int64_t pos,
>> +                                int64_t ts_min,
>> +                                int64_t ts,
>> +                                int64_t ts_max,
>> +                                int flags)
>> +{
>> +    AVSyncPoint sync[MAX_STREAMS], *sp;
>> +    AVStream *st;
>> +    int i;
>> +    int keyframes_to_find = 0;
>> +    int64_t curpos;
>> +    int64_t step;
>> +    int found_lo = 0, found_hi = 0;
>> +    int64_t min_distance;
>> +    int64_t min_pos = 0;
>> +
>> +    if (stream_index>= 0&&  !(flags&  AVSEEK_FLAG_BYTE)) {
>> +        // Rescale timestamps to AV_TIME_BASE, if timestamp of a 
>> reference stream given.
>> +        st = s->streams[stream_index];
>> +        if (ts != AV_NOPTS_VALUE)
>> +            ts = av_rescale(ts, AV_TIME_BASE * 
>> (int64_t)st->time_base.num, st->time_base.den);
>> +        if (ts_min != INT64_MIN)
>> +            ts_min = av_rescale(ts_min, AV_TIME_BASE * 
>> (int64_t)st->time_base.num, st->time_base.den);
>> +        if (ts_max != INT64_MAX)
>> +            ts_max = av_rescale(ts_max, AV_TIME_BASE * 
>> (int64_t)st->time_base.num, st->time_base.den);
>> +    }
>
> I don't think rescaling is needed for now since in mpegts all streams 
> have same timebase.
It is, since stream_index can be -1, which means, it's in AV_TIME_BASE. 
So I'd have to rescale in stream_index == -1 case also if the code were 
MPEG-specific. The rest of the code works in AV_TIME_BASE anyway and 
it's generic enough for other formats as well.

>
>> +    // Initialize syncpoint structures for each stream.
>> +    for (i = 0; i<  s->nb_streams; ++i) {
>> +        sp =&sync[i];
>> +        st = s->streams[i];
>> +        if (s->streams[i]->discard<  AVDISCARD_ALL) {
>> +            ++keyframes_to_find;
>> +            sp->pos_lo     = INT64_MAX;
>> +            sp->pts_lo     = INT64_MAX;
>> +            sp->dts_lo     = INT64_MAX;
>> +            sp->found_lo   = 0;
>> +            sp->pos_hi     = INT64_MAX;
>> +            sp->pts_hi     = INT64_MAX;
>> +            sp->dts_hi     = INT64_MAX;
>> +            sp->found_hi   = 0;
>> +            sp->epsilon    = AV_TIME_BASE * 
>> (int64_t)st->time_base.num / st->time_base.den;
>> +            sp->term_pos   = INT64_MAX;
>> +            sp->first_pos  = AV_NOPTS_VALUE;
>> +            sp->terminated = 0;
>> +            sp->active     = 1;
>> +        } else {
>> +            sp->active     = 0;
>> +        }
>> +    }
>> +
>> +    if (keyframes_to_find == 0) {
>> +        // No stream active, error.
>> +        return -1;
>> +    }
>> +
>> +    // Find keyframes in all active streams with timestamp/position 
>> just before
>> +    // and just after requested timestamp/position.
>> +    step = 1024;
>> +    for (;;) {
>> +        curpos = pos - step;
>> +        if (curpos<  0)
>> +            curpos = 0;
>> +        url_fseek(s->pb, curpos, SEEK_SET);
>> +        search_keyframe_multi_partial(
>> +                s,
>> +                ts, flags,
>> +                sync, keyframes_to_find,
>> +&found_lo,&found_hi);
>> +        if (found_lo == keyframes_to_find&&  found_hi == 
>> keyframes_to_find)
>> +            break;  // have all keyframes we wanted
>
> You can shortcut by checking if user specified a max_pos > ts or 
> min_pos < ts. In this case only found_lo or found_hi matters.
I'm not sure.

The max_pos/min_pos/max_ts/min_ts don't say "don't give me anything 
outside this range", rather "make sure my decoder output synchronizes in 
this range". I.e., it's perfectly OK to return a position before 
min_pos/min_ts.

Using only found_hi in forward seek is not OK, since there might be one 
stream with key frame past requested timestamp, but one or more other 
streams can have key frame before requested timestamp. So though the 
decoder synchronizes at timestamp > requested timestamp, we need to 
start reading before requested timestamp in order to synchronize closest 
to specified timestamp.

Using only found_lo for backward seek might work, but why introduce a 
special case? It's anyway necessary to scan each stream until high key 
frame, since our rough seeking position might be too early, so there are 
possibly several lo key frames after this position.
>
>> +        if (curpos == 0)
>> +            break;  // cannot go back anymore
>> +        step *= 2;
>> +        // switch termination positions
>> +        for (i = 0; i<  s->nb_streams; ++i) {
>> +            sp =&sync[i];
>> +            if (sp->active) {
>> +                sp->term_pos   = sp->first_pos;
>> +                sp->first_pos  = AV_NOPTS_VALUE;
>> +                sp->terminated = 0;
>> +            }
>
> Are both first_pos and term_pos needed ?
Yes. first_pos is the position of first frame found in current 
iteration. This will be copied to term_pos, so next iteration knows, 
this stream has been read up to this position and any frames coming 
afterwards are ignored. But other streams can still have some frames 
coming in. Only after all streams reach term_pos, we know that we really 
read all frames needed.

BTW, this will have to be adjusted, by keeping first_pos only if the 
frame has a known timestamp. Otherwise, we possibly miss a key frame. I 
think this is one of the problems with using the routine for MPEG-PS.

>
>> +    }
>> +
>> +    // Find actual position to start decoding so that decoder 
>> synchronizes
>> +    // closest to ts and update timestamps in streams.
>> +    pos = INT64_MAX;
>> +
>> +    for (i = 0; i<  s->nb_streams; ++i) {
>> +        sp =&sync[i];
>> +        if (sp->active) {
>> +            min_distance = INT64_MAX;
>> +            if (flags&  AVSEEK_FLAG_BYTE) {
>> +                // Find closest byte position to requested position, 
>> within min/max limits.
>> +                if (sp->pos_lo != INT64_MAX&&  ts_min<= 
>> sp->pos_lo&&  sp->pos_lo<= ts_max) {
>> +                    // low position is in range
>> +                    min_distance = ts - sp->pos_lo;
>> +                    min_pos = sp->pos_lo;
>> +                }
>> +                if (sp->pos_hi != INT64_MAX&&  ts_min<= 
>> sp->pos_hi&&  sp->pos_hi<= ts_max) {
>> +                    // high position is in range, check distance
>> +                    if (sp->pos_hi - ts<  min_distance) {
>> +                        min_distance = sp->pos_hi - ts;
>> +                        min_pos = sp->pos_hi;
>> +                    }
>> +                }
>> +            } else {
>> +                // Find timestamp closest to requested timestamp 
>> within min/max limits.
>> +                if (sp->pts_lo != INT64_MAX&&  ts_min<= sp->pts_lo + 
>> sp->epsilon&&  sp->pts_lo - sp->epsilon<= ts_max) {
>> +                    // low timestamp is in range
>> +                    min_distance = ts - sp->pts_lo;
>> +                    min_pos = sp->pos_lo;
>> +                }
>> +                if (sp->pts_hi != INT64_MAX&&  ts_min<= sp->pts_hi + 
>> sp->epsilon&&  sp->pts_hi - sp->epsilon<= ts_max) {
>> +                    // high timestamp is in range, check distance
>> +                    if (sp->pts_hi - ts<  min_distance) {
>> +                        min_distance = sp->pts_hi - ts;
>> +                        min_pos = sp->pos_hi;
>> +                    }
>> +                }
>> +            }
>
> Duplicate code between SEEK_BYTE and timestamps.
No. In SEEK_BYTE case, positions are compared, in normal case, 
timestamps. So it's not duplicate.

>
>> +            if (min_distance == INT64_MAX) {
>> +                // no timestamp is in range, cannot seek
>> +                return -1;
>> +            }
>> +            if (min_pos<  pos)
>> +                pos = min_pos;
>> +        }
>> +    }
>> +
>> +    // We now have a position, so update timestamps of all streams 
>> appropriately.
>> +    for (i = 0; i<  s->nb_streams; ++i) {
>> +        sp =&sync[i];
>> +        if (sp->active) {
>> +            if (sp->found_lo&&  sp->pos_lo>= pos) {
>> +                av_update_cur_dts(s, s->streams[i], sp->dts_lo);
>> +            } else if (sp->found_hi&&  sp->pos_hi>= pos) {
>> +                av_update_cur_dts(s, s->streams[i], sp->dts_hi);
>> +            } else {
>> +                // error, no suitable position found (actually 
>> impossible)
>> +                return -1;
>> +            }
>> +        }
>> +    }
>
> Do you really need update cur_dts ?
Yes. And actually, it has to be updated to correct DTS, otherwise 
timestamp generation diverges and seeking is incorrect (in MPEG-PS). I'm 
going to change DTS computation to have exact DTS. I'm thinking about 
simply putting all (stream_index, position, dts) triples encountered in 
an array and then picking out the closest higher DTS for seeking 
position. This array should be rather small, hopefully (up to 1.4 second 
worth of frames in MPEG, for instance, so maybe couple hundred entries), 
so I don't think any more advanced data structure is necessary.

Regards,

Ivan