[FFmpeg-devel] [PATCH] AVCHD/H.264 parser: determination of frame type, question about timestamps
Ivan Schreter
schreter
Sat Jan 17 20:33:15 CET 2009
Hello,
I am trying to get kdenlive working with AVCHD files from my new
camcorder (Panasonic HDC-SD9). There are several problems in ffmpeg,
which prevent it:
- first of all, h264 parser doesn't fill picture type, which confuses
libavformat
- timestamp and key frame handling is broken in libavformat
- seeking is broken
Regarding the first problem, I wrote a patch (see below) which
determines picture type (I/P/B) from the frame header. This allows for
correct computation of timestamps later (another patch will follow, but
see also below). As a side effect, since frame type is correctly
reported to libavformat, it can properly determine which frames are key
frames (i.e., I-frames).
Would please someone care to comment on this patch and eventually check
it in for me?
As for the timestamps, there is a complicated algorithm guessing
timestamps for frames which are missing DTS/PTS timestamps in
libavformat/utils.c. IMHO this can be done much easier and more elegant
in compute_pkt_fields() in about this form:
if (pkt->dts == AV_NOPTS_VALUE)
{
if(pkt->pts != AV_NOPTS_VALUE) {
// set DTS same as PTS
pkt->dts = pkt->pts;
} else if (st->last_IP_pts != AV_NOPTS_VALUE) {
// second half-frame of interlaced picture, same DTS/PTS
pkt->pts = st->last_pts;
pkt->dts = st->last_dts;
}
}
...
st->last_pts = pkt->pts;
st->last_dts = pkt->dts;
In any case, when I activate this workaround and also change calculation
of current DTS in compute_pkt_fields() to not offset it by frame
duration, then I can decode interlaced AVCHD video without skips and
dups (footage from my camcorder as well as Canon and Sony AVCHD
camcorders). I believe this algorithm must work always, given AVCHD
constraints on DTS/PTS requirements in headers. I checked resulting
transcoded video on frame-by-frame basis, it seems to be correct (and
lip-synced as well).
What I found quite funny, AVCHD camcorders obviously produce two
separate half-frames per frame of interlaced video. For an I-frame, they
actually produce an I-frame with DTS/PTS set followed by a P-frame with
no DTS/PTS. Then, 4 B-frames follow, two pairs, each pair with the first
half-frame with just a DTS and the second half-frame without DTS/PTS.
Then, two P-frames follow, first one having both DTS/PTS set, second one
having nothing set.
So it looks like the camcorders really code half-pictures at double
frame rate. I'm wondering, though, what is correct: should those second
half-pictures have exactly same DTS/PTS or one offset by half the frame
duration against first half-picture? Anyone knows? IMHO it should be
offset (in the example above it's not), but as I'm no expert on MPEG
format, I don't know. Further, should those frames carry frame duration
of 1/25th second or also halved duration of 1/50th second (depending on
frame rate, of course)? Anybody cares to elaborate?
In order to prevent bigger changes in libavformat, probably I could move
the determination of DTS/PTS in h264 parser/decoder, so they already
come correct from h264 and libavformat doesn't have to guess them. But
the issue of adding frame duration to last DTS in order to compute
"current" DTS in libavformat remains - this causes frame
drops/duplicates, as ffmpeg thinks a frame came too late (current DTS >
frame DTS), so it drops it, and then a frame is missing, so it
duplicates previous one.
Again, anybody to comment on this? I don't have a patch ready for it
yet, since I'm still experimenting with it.
The third problem with seeking in AVCHD format is probably a bit more
complex, I didn't look at it yet.
Thanks & regards,
Ivan
And here the patch for h264_parser.c:
--- h264_parser.c (revision
16655)
+++ h264_parser.c (working
copy)
@@ -26,6 +26,8
@@
*/
#include
"parser.h"
+#include
"golomb.h"
+#include
"h264data.h"
#include
"h264_parser.h"
#include
<assert.h>
@@ -96,6 +98,94
@@
return
i-(state&5);
}
+static int h264_extract_headers(AVCodecParserContext
*s,
+ AVCodecContext
*avctx,
+ const uint8_t *buf, int
buf_size)
+{
+ const uint8_t
*buf_end;
+ H264Context *h =
s->priv_data;
+ buf_end = buf +
buf_size;
+
+ if (h->is_avc && h->nal_length_size <
1)
+ return -1; /* AVC not inited yet
*/
+
+ while (buf < buf_end)
{
+ int nalsize =
0;
+ int buf_index =
0;
+ int
i;
+ uint8_t
nal_hdr;
+ uint8_t
nal_type;
+ int hdrsize =
1;
+
+ if (h->is_avc)
{
+ nalsize =
0;
+ for (i = 0; i < h->nal_length_size;
i++)
+ nalsize = (nalsize << 8) |
buf[buf_index++];
+ if (nalsize <= 1 || (buf + nalsize > buf_end))
{
+ if (nalsize == 1)
{
+
++buf;
+
--buf_size;
+
continue;
+ } else
{
+ av_log(h->s.avctx, AV_LOG_ERROR, "AVC: nal size
%d\n",
nalsize);
+
break;
+
}
+
}
+ } else
{
+ // start code prefix
search
+ for (; buf_index + 3 < buf_size;
buf_index++){
+ // This should always succeed in the first
iteration.
+ if(buf[buf_index] == 0 && buf[buf_index+1] == 0 &&
buf[buf_index+2] ==
1)
+
break;
+
}
+
+ if (buf_index + 3 >=
buf_size)
+
break;
+
+ buf_index +=
3;
+
}
+
+ buf +=
buf_index;
+
+ /* NAL header 1b zero, 2b nal_ref_idc, 5b nal_unit_type */
+ nal_hdr = *buf;
+ nal_type = nal_hdr & 0x1f;
+ if (nal_hdr & 0x80) {
+ av_log(h->s.avctx, AV_LOG_ERROR, "invalid NAL header
(%x)\n", nal_hdr);
+ return -1;
+ }
+ if (nal_type == 14 || nal_type == 20)
+ hdrsize += 3;
+ buf += hdrsize;
+
+ if (nal_type == NAL_SLICE) {
+ /* Decode picture type. It contains slice type in second
+ * variable-sized integer in buffer */
+ GetBitContext ctx;
+ int slice_type;
+ init_get_bits(&ctx, buf, 8 * (buf_end - buf));
+ /*first_mb_in_slice =*/ get_ue_golomb_31(&ctx);
+ slice_type = get_ue_golomb_31(&ctx);
+ if (slice_type > 9) {
+ av_log(h->s.avctx, AV_LOG_ERROR, "slice type too large
(%d)\n", slice_type);
+ return -1;
+ }
+ if (slice_type > 4)
+ slice_type -= 5;
+
+ slice_type = golomb_to_pict_type[slice_type];
+ s->pict_type= slice_type;
+ break; /* no more data necessary, save some time */
+ }
+
+ if (h->is_avc)
+ buf += nalsize - buf_index;
+ buf_size = buf_end - buf;
+ }
+
+ return 0;
+}
+
static int h264_parse(AVCodecParserContext *s,
AVCodecContext *avctx,
const uint8_t **poutbuf, int *poutbuf_size,
@@ -122,6 +212,9 @@
}
}
+ /* we have a full frame, get picture type from headers */
+ h264_extract_headers(s, avctx, buf, buf_size);
+
*poutbuf = buf;
*poutbuf_size = buf_size;
return next;
More information about the ffmpeg-devel
mailing list