[FFmpeg-devel] [PATCH] AVCHD/H.264 parser: determination of frame type, question about timestamps

Sat Jan 17 20:33:15 CET 2009

Hello,

I am trying to get kdenlive working with AVCHD files from my new 
camcorder (Panasonic HDC-SD9). There are several problems in ffmpeg, 
which prevent it:
  - first of all, h264 parser doesn't fill picture type, which confuses 
libavformat
  - timestamp and key frame handling is broken in libavformat
  - seeking is broken

Regarding the first problem, I wrote a patch (see below) which 
determines picture type (I/P/B) from the frame header. This allows for 
correct computation of timestamps later (another patch will follow, but 
see also below). As a side effect, since frame type is correctly 
reported to libavformat, it can properly determine which frames are key 
frames (i.e., I-frames).

Would please someone care to comment on this patch and eventually check 
it in for me?

As for the timestamps, there is a complicated algorithm guessing 
timestamps for frames which are missing DTS/PTS timestamps in 
libavformat/utils.c. IMHO this can be done much easier and more elegant 
in compute_pkt_fields() in about this form:

    if (pkt->dts == AV_NOPTS_VALUE)
    {
        if(pkt->pts != AV_NOPTS_VALUE) {
            // set DTS same as PTS
            pkt->dts = pkt->pts;
        } else if (st->last_IP_pts != AV_NOPTS_VALUE) {
            // second half-frame of interlaced picture, same DTS/PTS
            pkt->pts = st->last_pts;
            pkt->dts = st->last_dts;
        }
    }
    ...
    st->last_pts = pkt->pts;
    st->last_dts = pkt->dts;

In any case, when I activate this workaround and also change calculation 
of current DTS in compute_pkt_fields() to not offset it by frame 
duration, then I can decode interlaced AVCHD video without skips and 
dups (footage from my camcorder as well as Canon and Sony AVCHD 
camcorders). I believe this algorithm must work always, given AVCHD 
constraints on DTS/PTS requirements in headers. I checked resulting 
transcoded video on frame-by-frame basis, it seems to be correct (and 
lip-synced as well).

What I found quite funny, AVCHD camcorders obviously produce two 
separate half-frames per frame of interlaced video. For an I-frame, they 
actually produce an I-frame with DTS/PTS set followed by a P-frame with 
no DTS/PTS. Then, 4 B-frames follow, two pairs, each pair with the first 
half-frame with just a DTS and the second half-frame without DTS/PTS. 
Then, two P-frames follow, first one having both DTS/PTS set, second one 
having nothing set.

So it looks like the camcorders really code half-pictures at double 
frame rate. I'm wondering, though, what is correct: should those second 
half-pictures have exactly same DTS/PTS or one offset by half the frame 
duration against first half-picture? Anyone knows? IMHO it should be 
offset (in the example above it's not), but as I'm no expert on MPEG 
format, I don't know. Further, should those frames carry frame duration 
of 1/25th second or also halved duration of 1/50th second (depending on 
frame rate, of course)? Anybody cares to elaborate?

In order to prevent bigger changes in libavformat, probably I could move 
the determination of DTS/PTS in h264 parser/decoder, so they already 
come correct from h264 and libavformat doesn't have to guess them. But 
the issue of adding frame duration to last DTS in order to compute 
"current" DTS in libavformat remains - this causes frame 
drops/duplicates, as ffmpeg thinks a frame came too late (current DTS > 
frame DTS), so it drops it, and then a frame is missing, so it 
duplicates previous one.

Again, anybody to comment on this? I don't have a patch ready for it 
yet, since I'm still experimenting with it.

The third problem with seeking in AVCHD format is probably a bit more 
complex, I didn't look at it yet.

Thanks & regards,

Ivan


And here the patch for h264_parser.c:

--- h264_parser.c       (revision 
16655)                                       
+++ h264_parser.c       (working 
copy)                                         
@@ -26,6 +26,8 
@@                                                              
  
*/                                                                            

                                                                                

 #include 
"parser.h"                                                           
+#include 
"golomb.h"                                                           
+#include 
"h264data.h"                                                         
 #include 
"h264_parser.h"                                                      
                                                                                

 #include 
<assert.h>                                                           
@@ -96,6 +98,94 
@@                                                             
     return 
i-(state&5);                                                       
 }                                                                              

                                                                                

+static int h264_extract_headers(AVCodecParserContext 
*s,                      
+                                AVCodecContext 
*avctx,                        
+                                const uint8_t *buf, int 
buf_size)             
+{                                                                              

+    const uint8_t 
*buf_end;                                                   
+    H264Context *h = 
s->priv_data;                                            
+    buf_end = buf + 
buf_size;                                                 
+                                                                               

+    if (h->is_avc && h->nal_length_size < 
1)                                  
+        return -1;     /* AVC not inited yet 
*/                               
+                                                                               

+    while (buf < buf_end) 
{                                                   
+        int nalsize = 
0;                                                      
+        int buf_index = 
0;                                                    
+        int 
i;                                                                
+        uint8_t 
nal_hdr;                                                      
+        uint8_t 
nal_type;                                                     
+       int hdrsize = 
1;                                                       
+                                                                               

+        if (h->is_avc) 
{                                                      
+            nalsize = 
0;                                                      
+            for (i = 0; i < h->nal_length_size; 
i++)                          
+                nalsize = (nalsize << 8) | 
buf[buf_index++];                  
+            if (nalsize <= 1 || (buf + nalsize > buf_end)) 
{                  
+                if (nalsize == 1) 
{                                           
+                    
++buf;                                                    
+                    
--buf_size;                                               
+                    
continue;                                                 
+                } else 
{                                                      
+                    av_log(h->s.avctx, AV_LOG_ERROR, "AVC: nal size 
%d\n", 
nalsize);                                                                           

+                    
break;                                                    
+                
}                                                             
+            
}                                                                 
+        } else 
{                                                              
+            // start code prefix 
search                                       
+            for (; buf_index + 3 < buf_size; 
buf_index++){                    
+                // This should always succeed in the first 
iteration.         
+                if(buf[buf_index] == 0 && buf[buf_index+1] == 0 && 
buf[buf_index+2] == 
1)                                                                     
+                    
break;                                                    
+            
}                                                                 
+                                                                               

+            if (buf_index + 3 >= 
buf_size)                                    
+               
break;                                                         
+                                                                               

+            buf_index += 
3;                                                   
+        
}                                                                     
+                                                                               

+       buf += 
buf_index;                                                      
+
+        /* NAL header 1b zero, 2b nal_ref_idc, 5b nal_unit_type */
+        nal_hdr = *buf;
+        nal_type = nal_hdr & 0x1f;
+        if (nal_hdr & 0x80) {
+            av_log(h->s.avctx, AV_LOG_ERROR, "invalid NAL header 
(%x)\n", nal_hdr);
+            return -1;
+        }
+        if (nal_type == 14 || nal_type == 20)
+            hdrsize += 3;
+        buf += hdrsize;
+
+       if (nal_type == NAL_SLICE) {
+            /* Decode picture type. It contains slice type in second
+             * variable-sized integer in buffer  */
+            GetBitContext ctx;
+            int slice_type;
+            init_get_bits(&ctx, buf, 8 * (buf_end - buf));
+            /*first_mb_in_slice =*/ get_ue_golomb_31(&ctx);
+            slice_type = get_ue_golomb_31(&ctx);
+            if (slice_type > 9) {
+                av_log(h->s.avctx, AV_LOG_ERROR, "slice type too large 
(%d)\n", slice_type);
+                return -1;
+             }
+             if (slice_type > 4)
+                 slice_type -= 5;
+
+             slice_type = golomb_to_pict_type[slice_type];
+             s->pict_type= slice_type;
+             break;    /* no more data necessary, save some time */
+        }
+
+        if (h->is_avc)
+            buf += nalsize - buf_index;
+        buf_size = buf_end - buf;
+    }
+
+    return 0;
+}
+
 static int h264_parse(AVCodecParserContext *s,
                       AVCodecContext *avctx,
                       const uint8_t **poutbuf, int *poutbuf_size,
@@ -122,6 +212,9 @@
         }
     }

+    /* we have a full frame, get picture type from headers */
+    h264_extract_headers(s, avctx, buf, buf_size);
+
     *poutbuf = buf;
     *poutbuf_size = buf_size;
     return next;