[NUT-devel] r20862 - trunk/DOCS/tech/nut.txt

Måns Rullgård mru at inprovide.com
Mon Nov 13 13:56:56 CET 2006


Michael Niedermayer said:
> Hi
>
> On Sun, Nov 12, 2006 at 09:42:17PM -0500, Rich Felker wrote:
>> On Sun, Nov 12, 2006 at 01:24:57PM +0100, michael wrote:
>> > Author: michael
>> > Date: Sun Nov 12 13:24:57 2006
>> > New Revision: 20862
>> >
>> > Modified:
>> >    trunk/DOCS/tech/nut.txt
>> >
>> > Log:
>> > least restrictive dts ordering rule which ensures frames are in decoding order
>> >
>> >
>> > Modified: trunk/DOCS/tech/nut.txt
>> > ==============================================================================
>> > --- trunk/DOCS/tech/nut.txt	(original)
>> > +++ trunk/DOCS/tech/nut.txt	Sun Nov 12 13:24:57 2006
>> > @@ -670,6 +670,8 @@
>> >      Pts of all frames in all streams MUST be bigger or equal to dts of all
>> >      previous frames in all streams, compared in common timebase. (EOR
>> >      frames are NOT exempt from this rule)
>> > +    Dts of all frames MUST be bigger or equal to dts of all previous frames
>> > +    in the same stream
>>
>> This is guaranteed by the definition of DTS and the above condition on
>> PTS, isn't it?
>
> i dont know but just looking at the definition of decode_delay gives me
> doubt
> "decode_delay
>     maximum time between input and output for a codec, used to generate
>     dts from pts
>     is set to 0 for streams without B-frames, and set to 1 for streams with
>     B-frames, may be larger for future codecs
>     decode_delay MUST NOT be set higher than necessary for a codec."
>
>
> what is the "maximum time between input and output for a codec" ?
> its not the time between a frame input and output IPBBB ->IBBBP (=3)
> its neither the smallest number so that dts are monotone (IPPPP)
> and codec is decoder + encoder that too makes no sense
>
> i dont know what i was thinking when i wrote that :(
>
>
> its rather
> dts of a frame is the time when it is input into the decoder
> pts is the time of presentation of the first corresponding decoded sample
> and decode_delay is then the size of the timestamp sorting buffer that
> the above has a solution for
>
> note, the above is a little fuzzy i know but if we define pts like
> pts is the time of presentation of the first sample affected by the frame
> then IPBBB would have I.pts=0 P.pts=1 as the b frame is affected by P
>
> comments, objections?
> (if there are no objections then ill change that in the spec)

Not that my word counts for much around here, but that is not a good definition.
The PTS must be defined as the presentation time of the first frame/sample that
is completed by that coded frame.  With your suggested definition, a coded IPBB
sequence (displayed IBBP) the PTS of the P and B frames would all be 1.  This
is clearly not a good situation.

Just for reference, here are the definitions of PTS and DTS from ISO 13818-1:

PTS (presentation time stamp) ­ Presentation times shall be related to
decoding times as follows: The PTS is a 33-bit number coded in three
separate fields. It indicates the time of presentation, tpn(k), in the
system target decoder of a presentation unit k of elementary stream
n. The value of PTS is specified in units of the period of the system
clock frequency divided by 300 (yielding 90 kHz). The presentation
time is derived from the PTS according to equation 2-11 below. Refer
to 2.7.4 for constraints on the frequency of coding presentation
timestamps.

    PTS(k) = ((system_clock_ frequency × tpn (k)) DIV 300) % 233  (2-11)

where tpn(k) is the presentation time of presentation unit Pn(k).

In the case of audio, if a PTS is present in PES packet header it
shall refer to the first access unit commencing in the PES packet. An
audio access unit commences in a PES packet if the first byte of the
audio access unit is present in the PES packet.

In the case of video, if a PTS is present in a PES packet header it
shall refer to the access unit containing the first picture start code
that commences in this PES packet. A picture start code commences in
PES packet if the first byte of the picture start code is present in
the PES packet.

For audio presentation units (PUs), video PUs in low_delay sequences,
and B-pictures, the presentation time tpn(k) shall be equal to the
decoding time tdn(k).

For I- and P-pictures in non-low_delay sequences and in the case when
there is no decoding discontinuity between access units (AUs) k and
k', the presentation time tpn(k) shall be equal to the decoding time
tdn(k') of the next transmitted I- or P-picture (refer to 2.7.5). If
there is a decoding discontinuity, or the stream ends, the difference
between tpn(k) and tdn(k) shall be the same as if the original stream
had continued without a discontinuity and without ending.

    NOTE 1 ­ A low_delay sequence is a video sequence in which the
    low_delay flag is set (refer to 6.2.2.3 of ITU-T Rec. H.262 | ISO/IEC
    13818-2).

If there is filtering in audio, it is assumed by the system model that
filtering introduces no delay, hence the sample referred to by PTS at
encoding is the same sample referred to by PTS at decoding.

DTS (decoding time stamp) ­ The DTS is a 33-bit number coded in three
separate fields. It indicates the decoding time, tdn(j), in the system
target decoder of an access unit j of elementary stream n. The value
of DTS is specified in units of the period of the system clock
frequency divided by 300 (yielding 90 kHz). The decoding time derived
from the DTS according to equation 2-12 below:

    DTS(j) = ((system_clock_ frequency × tdn (j)) DIV 300) % 233  (2-12)

where tdn(j) is the decoding time of access unit An(j).

In the case of video, if a DTS is present in a PES packet header it
shall refer to the access unit containing the first picture start code
that commences in this PES packet. A picture start code commences in
PES packet if the first byte of the picture start code is present in
the PES packet.

-- 
Måns Rullgård
mru at inprovide.com



More information about the NUT-devel mailing list