[FFmpeg-devel] [PATCH] First shot at AVCHD seeking via new seeking API
Ivan Schreter
schreter
Sun Jul 26 15:00:16 CEST 2009
Hi Baptiste,
first of all, thanks for the review.
I am also trying to put this searching routine to work for MPEG-PS, so
I'll have to change the code a bit (hopefully only two things). I'll
explain in the comments below.
Baptiste Coudurier wrote:
> Hi Ivan,
>
> On 07/25/2009 04:14 AM, Ivan Schreter wrote:
>> Hi,
>>
>> [...]
>> If noone objects, I'll commit them during the next week.
>>
>
> Since it is only used in mpegts for now, please put all the code in
> mpegts.c, these symbols do not need to be exported.
Yes and no. As I wrote above, I'm trying to put this to work for MPEG-PS
as well. I've seen that MPEG-PS uses index API, but it seems not to work
correctly for seeking anyway. The same method used for seeking in
MPEG-TS can be also used in MPEG-PS (without an index). Other formats
will probably profit from it as well.
So either it lands in utils.c (which is IMHO the proper place) or I'll
create a new seek_utils.c, which will contain the routine (and
optionally move other seek-relevant code from utils.c into the new
file). Or eventually in some common MPEG source file (which one?). What
do you suggest?
>
> Also, I think state must not be trashed if seek failed, that is
> av_read_frame_flush must not be called before seeking.
This is hard to achieve. In order to find keyframes, I have to start
reading the frames via the parser. This trashes the state. Only then I
know whether the seek succeeds or not. Maybe it would be possible to
save the state and restore it on error. I'll look into it.
>
> > [...]
>>
>> +/**
>> + * Helper structure describing keyframe search state of one stream.
>> + */
>> +typedef struct {
>> + int active; ///< Set to 1 for active streams, 0 for
>> inactive.
>
> I don't think inactive stream need a structure, so this field is
> unneeded.
Either I need an "active" flag, or I need to know index of the stream to
which this struct refers. But when searching for associated structure
from read packet, it's much easier to use stream_index of the packet to
find the appropriate structure in O(1) than looking for it via O(n) loop.
Or did you mean something different?
>
>> + int64_t epsilon; ///< Rounding error based on time base.
>
> Can you explain why it is needed ?
Timestamps are scaled back and forth between stream time base (of one or
more streams with possibly different time bases) and AV_TIME_BASE, also
depending on how the API is called. This is to account for rounding
errors between time base of the stream and AV_TIME_BASE.
>
> > [...]
> >
>> +
>> +/**
>> + * Partial search for keyframe in multiple streams.
>> + *
>> + * This routine searches for the next lower and next higher
>> timestamp to
>> + * given target timestamp in each stream, starting at current file
>> position
>> + * and ending at position, where all streams have already been examined
>> + * (or when all higher key frames found in first iteration).
>> + *
>> + * This routine is called iteratively with exponential backoff to
>> find lower
>> + * timestamp.
>> + *
>> + * @param s format context.
>> + * @param timestamp target timestamp (or position, if
>> AVSEEK_FLAG_BYTE).
>> + * @param flags seeking flags.
>> + * @param sync array with information per stream.
>> + * @param keyframes_to_find count of keyframes to find in total.
>> + * @param found_lo pointer to count of already found low
>> timestamp keyframes.
>> + * @param found_hi pointer to count of already found high
>> timestamp keyframes.
>> + */
>> +static void search_keyframe_multi_partial(AVFormatContext *s,
>> + int64_t timestamp,
>> + int flags,
>> + AVSyncPoint *sync,
>> + int keyframes_to_find,
>> + int *found_lo,
>> + int *found_hi)
>
> I don't think the name is adequate. I'm thinking of a better one.
Yeah, the name is not particularly good, so if you have a better idea,
it's welcome.
>
> [...]
>> + // Use file position as fallback, if no position
>> returned. This is
>> + // inexact in most formats, so format parser should
>> always fill
>> + // proper position to make seeking routine return
>> reliable results.
>> + pos = fpos;
>> + }
>
> Actually I think fpos is more reliable in our situation :)
Actually, it is not. As you know, in MPEG, a frame is distributed over
several packets interleaved with packets of frames of other streams. It
is returned from the parser, after the last packet of the frame is read.
Thus, fpos is unreliable, since it's the position of the last packet of
the frame, but we need the first one. It's just a fallback, which may or
may not work for a particular format. It wouldn't for MPEG.
>
>> + if (sp->first_pos == AV_NOPTS_VALUE) {
>> + // Note down termination position for the next iteration
>> - when
>> + // we encounter a packet with the same position, we will
>> ignore
>> + // any further packets for this stream in next iteration
>> (as they
>> + // are already evaluated).
>> + sp->first_pos = pos;
>> + }
>> +
>> + if (sp->term_pos != AV_NOPTS_VALUE&& pos> sp->term_pos) {
>> + // We are past the end position from last iteration,
>> ignore packet.
>> + if (!sp->terminated) {
>> + sp->terminated = 1;
>> + ++terminated_count;
>> + if (terminated_count == keyframes_to_find)
>> + break; // all terminated, iteration done
>> + }
>> + continue;
>> + }
>
> Do term_pos and first_pos need to be per stream ?
Yes. Reason as above - frames for various streams interleaved in
packets, so the starting position of frames in various streams don't
necessarily correlate with their arrival via av_read_frame().
>
>> + // Evaluate key frames with known TS (or any frames, if
>> AVSEEK_FLAG_ANY set).
>> + if (pts != AV_NOPTS_VALUE&& ((flg& PKT_FLAG_KEY) ||
>> (flags& AVSEEK_FLAG_ANY))) {
>> + // Rescale stream timestamp to AV_TIME_BASE.
>> + st = s->streams[idx];
>> + pts_tb = av_rescale(pts, AV_TIME_BASE *
>> (int64_t)st->time_base.num, st->time_base.den);
>> + dts_tb = av_rescale(dts, AV_TIME_BASE *
>> (int64_t)st->time_base.num, st->time_base.den);
>> +
>> + if (flags& AVSEEK_FLAG_BYTE) {
>> + // Seek by byte position.
>> + if (pos<= timestamp) {
>> + // Keyframe found before target position.
>> + if (!sp->found_lo) {
>> + // Found first keyframe lower than target
>> position.
>> + sp->found_lo = 1;
>> + (*found_lo)++;
>> + sp->pts_lo = pts_tb;
>> + sp->dts_lo = dts_tb;
>> + sp->pos_lo = pos;
>> + } else if (sp->pos_lo< pos) {
>> + // Found a better match (closer to target
>> position).
>> + sp->pts_lo = pts_tb;
>> + sp->dts_lo = dts_tb;
>> + sp->pos_lo = pos;
>> + }
>> + }
>> + if (pos>= timestamp) {
>> + // Keyframe found after target position.
>> + if (!sp->found_hi) {
>> + // Found first keyframe higher than target
>> position.
>> + sp->found_hi = 1;
>> + (*found_hi)++;
>> + sp->pts_hi = pts_tb;
>> + sp->dts_hi = dts_tb;
>> + sp->pos_hi = pos;
>> + if (*found_hi>= keyframes_to_find&&
>> sp->term_pos == INT64_MAX) {
>> + // We found high frame for all. They may
>> get updated
>> + // to TS closer to target TS in later
>> iterations (which
>> + // will stop at start position of
>> previous iteration).
>> + break;
>> + }
>> + } else if (sp->pos_hi> pos) {
>> + // Found a better match (actually, shouldn't
>> happen).
>> + sp->pts_hi = pts_tb;
>> + sp->dts_hi = dts_tb;
>> + sp->pos_hi = pos;
>> + }
>> + }
>> + } else {
>> + // Seek by timestamp.
>> + if (pts_tb<= timestamp + sp->epsilon) {
>> + // Keyframe found before target timestamp.
>> + if (!sp->found_lo) {
>> + // Found first keyframe lower than target
>> timestamp.
>> + sp->found_lo = 1;
>> + (*found_lo)++;
>> + sp->pts_lo = pts_tb;
>> + sp->dts_lo = dts_tb;
>> + sp->pos_lo = pos;
>> + } else if (sp->pts_lo< pts_tb) {
>> + // Found a better match (closer to target
>> timestamp).
>> + sp->pts_lo = pts_tb;
>> + sp->dts_lo = dts_tb;
>> + sp->pos_lo = pos;
>> + }
>> + }
>> + if (pts_tb + sp->epsilon>= timestamp) {
>> + // Keyframe found after target timestamp.
>> + if (!sp->found_hi) {
>> + // Found first keyframe higher than target
>> timestamp.
>> + sp->found_hi = 1;
>> + (*found_hi)++;
>> + sp->pts_hi = pts_tb;
>> + sp->dts_hi = dts_tb;
>> + sp->pos_hi = pos;
>> + if (*found_hi>= keyframes_to_find&&
>> sp->term_pos == INT64_MAX) {
>> + // We found high frame for all. They may
>> get updated
>> + // to TS closer to target TS in later
>> iterations (which
>> + // will stop at start position of
>> previous iteration).
>> + break;
>> + }
>
> If I understood correctly, at first iteration you should find high
> keyframes for all streams.
Not necessarily. Normally, yes, but there is a special case, with key
frame in a stream beginning before rough position and ending after it.
The frame won't be returned in the first iteration, only possibly a
later key frame. It will only be found in later iteration (or not found
at all, if seeking against end of the file).
>
> If SEEK_BYTE is requested and the pos you are at is > max_ts (max
> position) or term_pos was earlier at the previous iteration, then you
> can break earlier after read_frame anyway.
>
> Do I miss something ?
Again, interleaved frames in MPEG. This makes the stuff somewhat complex.
>
>> + } else if (sp->pts_hi> pts_tb) {
>> + // Found a better match (actually, shouldn't
>> happen).
>> + sp->pts_hi = pts_tb;
>> + sp->dts_hi = dts_tb;
>> + sp->pos_hi = pos;
>> + }
>
> Duplicate code between SEEK_BYTE and timestamps. I believe you only
> need one hi and one lo per stream (will be set to either pos or
> timestamp), you don't need both pts and dts. Please explain why if
> that's not possible.
In case of SEEK_BYTE, I only need pos. In normal case, I need pos and
pts. As for dts, I'm going to solve this differently (see later), since
current DTS update code is inexact, which makes problems with MPEG-PS
streams.
>
>> + }
>> + }
>> + }
>> +
>> + // Clean up the parser.
>> + av_read_frame_flush(s);
>> +}
>> +
>> +int64_t ff_gen_syncpoint_search(AVFormatContext *s,
>> + int stream_index,
>> + int64_t pos,
>> + int64_t ts_min,
>> + int64_t ts,
>> + int64_t ts_max,
>> + int flags)
>> +{
>> + AVSyncPoint sync[MAX_STREAMS], *sp;
>> + AVStream *st;
>> + int i;
>> + int keyframes_to_find = 0;
>> + int64_t curpos;
>> + int64_t step;
>> + int found_lo = 0, found_hi = 0;
>> + int64_t min_distance;
>> + int64_t min_pos = 0;
>> +
>> + if (stream_index>= 0&& !(flags& AVSEEK_FLAG_BYTE)) {
>> + // Rescale timestamps to AV_TIME_BASE, if timestamp of a
>> reference stream given.
>> + st = s->streams[stream_index];
>> + if (ts != AV_NOPTS_VALUE)
>> + ts = av_rescale(ts, AV_TIME_BASE *
>> (int64_t)st->time_base.num, st->time_base.den);
>> + if (ts_min != INT64_MIN)
>> + ts_min = av_rescale(ts_min, AV_TIME_BASE *
>> (int64_t)st->time_base.num, st->time_base.den);
>> + if (ts_max != INT64_MAX)
>> + ts_max = av_rescale(ts_max, AV_TIME_BASE *
>> (int64_t)st->time_base.num, st->time_base.den);
>> + }
>
> I don't think rescaling is needed for now since in mpegts all streams
> have same timebase.
It is, since stream_index can be -1, which means, it's in AV_TIME_BASE.
So I'd have to rescale in stream_index == -1 case also if the code were
MPEG-specific. The rest of the code works in AV_TIME_BASE anyway and
it's generic enough for other formats as well.
>
>> + // Initialize syncpoint structures for each stream.
>> + for (i = 0; i< s->nb_streams; ++i) {
>> + sp =&sync[i];
>> + st = s->streams[i];
>> + if (s->streams[i]->discard< AVDISCARD_ALL) {
>> + ++keyframes_to_find;
>> + sp->pos_lo = INT64_MAX;
>> + sp->pts_lo = INT64_MAX;
>> + sp->dts_lo = INT64_MAX;
>> + sp->found_lo = 0;
>> + sp->pos_hi = INT64_MAX;
>> + sp->pts_hi = INT64_MAX;
>> + sp->dts_hi = INT64_MAX;
>> + sp->found_hi = 0;
>> + sp->epsilon = AV_TIME_BASE *
>> (int64_t)st->time_base.num / st->time_base.den;
>> + sp->term_pos = INT64_MAX;
>> + sp->first_pos = AV_NOPTS_VALUE;
>> + sp->terminated = 0;
>> + sp->active = 1;
>> + } else {
>> + sp->active = 0;
>> + }
>> + }
>> +
>> + if (keyframes_to_find == 0) {
>> + // No stream active, error.
>> + return -1;
>> + }
>> +
>> + // Find keyframes in all active streams with timestamp/position
>> just before
>> + // and just after requested timestamp/position.
>> + step = 1024;
>> + for (;;) {
>> + curpos = pos - step;
>> + if (curpos< 0)
>> + curpos = 0;
>> + url_fseek(s->pb, curpos, SEEK_SET);
>> + search_keyframe_multi_partial(
>> + s,
>> + ts, flags,
>> + sync, keyframes_to_find,
>> +&found_lo,&found_hi);
>> + if (found_lo == keyframes_to_find&& found_hi ==
>> keyframes_to_find)
>> + break; // have all keyframes we wanted
>
> You can shortcut by checking if user specified a max_pos > ts or
> min_pos < ts. In this case only found_lo or found_hi matters.
I'm not sure.
The max_pos/min_pos/max_ts/min_ts don't say "don't give me anything
outside this range", rather "make sure my decoder output synchronizes in
this range". I.e., it's perfectly OK to return a position before
min_pos/min_ts.
Using only found_hi in forward seek is not OK, since there might be one
stream with key frame past requested timestamp, but one or more other
streams can have key frame before requested timestamp. So though the
decoder synchronizes at timestamp > requested timestamp, we need to
start reading before requested timestamp in order to synchronize closest
to specified timestamp.
Using only found_lo for backward seek might work, but why introduce a
special case? It's anyway necessary to scan each stream until high key
frame, since our rough seeking position might be too early, so there are
possibly several lo key frames after this position.
>
>> + if (curpos == 0)
>> + break; // cannot go back anymore
>> + step *= 2;
>> + // switch termination positions
>> + for (i = 0; i< s->nb_streams; ++i) {
>> + sp =&sync[i];
>> + if (sp->active) {
>> + sp->term_pos = sp->first_pos;
>> + sp->first_pos = AV_NOPTS_VALUE;
>> + sp->terminated = 0;
>> + }
>
> Are both first_pos and term_pos needed ?
Yes. first_pos is the position of first frame found in current
iteration. This will be copied to term_pos, so next iteration knows,
this stream has been read up to this position and any frames coming
afterwards are ignored. But other streams can still have some frames
coming in. Only after all streams reach term_pos, we know that we really
read all frames needed.
BTW, this will have to be adjusted, by keeping first_pos only if the
frame has a known timestamp. Otherwise, we possibly miss a key frame. I
think this is one of the problems with using the routine for MPEG-PS.
>
>> + }
>> +
>> + // Find actual position to start decoding so that decoder
>> synchronizes
>> + // closest to ts and update timestamps in streams.
>> + pos = INT64_MAX;
>> +
>> + for (i = 0; i< s->nb_streams; ++i) {
>> + sp =&sync[i];
>> + if (sp->active) {
>> + min_distance = INT64_MAX;
>> + if (flags& AVSEEK_FLAG_BYTE) {
>> + // Find closest byte position to requested position,
>> within min/max limits.
>> + if (sp->pos_lo != INT64_MAX&& ts_min<=
>> sp->pos_lo&& sp->pos_lo<= ts_max) {
>> + // low position is in range
>> + min_distance = ts - sp->pos_lo;
>> + min_pos = sp->pos_lo;
>> + }
>> + if (sp->pos_hi != INT64_MAX&& ts_min<=
>> sp->pos_hi&& sp->pos_hi<= ts_max) {
>> + // high position is in range, check distance
>> + if (sp->pos_hi - ts< min_distance) {
>> + min_distance = sp->pos_hi - ts;
>> + min_pos = sp->pos_hi;
>> + }
>> + }
>> + } else {
>> + // Find timestamp closest to requested timestamp
>> within min/max limits.
>> + if (sp->pts_lo != INT64_MAX&& ts_min<= sp->pts_lo +
>> sp->epsilon&& sp->pts_lo - sp->epsilon<= ts_max) {
>> + // low timestamp is in range
>> + min_distance = ts - sp->pts_lo;
>> + min_pos = sp->pos_lo;
>> + }
>> + if (sp->pts_hi != INT64_MAX&& ts_min<= sp->pts_hi +
>> sp->epsilon&& sp->pts_hi - sp->epsilon<= ts_max) {
>> + // high timestamp is in range, check distance
>> + if (sp->pts_hi - ts< min_distance) {
>> + min_distance = sp->pts_hi - ts;
>> + min_pos = sp->pos_hi;
>> + }
>> + }
>> + }
>
> Duplicate code between SEEK_BYTE and timestamps.
No. In SEEK_BYTE case, positions are compared, in normal case,
timestamps. So it's not duplicate.
>
>> + if (min_distance == INT64_MAX) {
>> + // no timestamp is in range, cannot seek
>> + return -1;
>> + }
>> + if (min_pos< pos)
>> + pos = min_pos;
>> + }
>> + }
>> +
>> + // We now have a position, so update timestamps of all streams
>> appropriately.
>> + for (i = 0; i< s->nb_streams; ++i) {
>> + sp =&sync[i];
>> + if (sp->active) {
>> + if (sp->found_lo&& sp->pos_lo>= pos) {
>> + av_update_cur_dts(s, s->streams[i], sp->dts_lo);
>> + } else if (sp->found_hi&& sp->pos_hi>= pos) {
>> + av_update_cur_dts(s, s->streams[i], sp->dts_hi);
>> + } else {
>> + // error, no suitable position found (actually
>> impossible)
>> + return -1;
>> + }
>> + }
>> + }
>
> Do you really need update cur_dts ?
Yes. And actually, it has to be updated to correct DTS, otherwise
timestamp generation diverges and seeking is incorrect (in MPEG-PS). I'm
going to change DTS computation to have exact DTS. I'm thinking about
simply putting all (stream_index, position, dts) triples encountered in
an array and then picking out the closest higher DTS for seeking
position. This array should be rather small, hopefully (up to 1.4 second
worth of frames in MPEG, for instance, so maybe couple hundred entries),
so I don't think any more advanced data structure is necessary.
Regards,
Ivan
More information about the ffmpeg-devel
mailing list