[FFmpeg-user] Illustration review, Streams and GOP Frame Reordering [was: a couple of things to look at]

Mon Mar 4 01:33:07 EET 2024

Mark:

On 2024-03-02 19:51, Mark Filipak wrote:
> I have a couple of things to look at.
>
> https://markfilipak.github.io/Video-Object-Notation/Streams.html
> https://markfilipak.github.io/Video-Object-Notation/GOP%20%26%20Frame%20Reordering.html 
>
>
> Comments are welcome. Please be brutal. 'Streams' is crucial.

Good work! I think both of these illustrations are helpful.  The Streams 
illustration is improved since the draft I saw earlier.

Regarding the Streams illustration 
<https://markfilipak.github.io/Video-Object-Notation/Streams.html>:

The macroblock to slice to picdata transition is clear. Showing 45 
macroblocks in a horizontal slice works.  Good work.

In fact, I think the complete list of 45 macroblocks in a horizontal 
slice is not necessary. You could keep, say, blocks 0..11, elide blocks 
12..42 with a horizontal ellipsis, and keep blocks 43..44. With the 
horizontal space saved, make the block numbers two digits. It is hard to 
count out 45 from single-digit numbers. 00..44 would be much clearer.

The complete list of 0..29 slices is visually overwhelming, and not 
necessary.  I think you could keep slices 0..2, elide slices 3..27 with 
a vertical ellipsis, and keep slices 28..29.  That would get the slice 
structure across.

The slice structure lacks a comment with size, of the sort you included 
for macroblock and picdata. The full slice structure does not leave any 
room for such a comment. But the vertical ellipsis for slices 3..27 will 
be horizontally compact, so you could place a comment with the slice 
size next to the ellipsis.

Regarding the GOP & Frame Reordering illustration, 
<https://markfilipak.github.io/Video-Object-Notation/GOP%20%26%20Frame%20Reordering.html>:

Time is plastic in illustration space also. You have term definitions 
which happen after the first use of those terms. It would be easier to 
follow if the term definitions could come at first use.

The opening text, "an I-frame followed by P-frames and optional 
B-frames", could be improved by adding term definitions. e.g. "an 
I-frame (complete unto itself, sometimes called keyframe) followed by 
P-frames (predictive based on differences with the preceding I-frame) 
and optional B-frames (bipredictive based on differences with the 
preceding P-frame and I-frame)".

The first rectangle, GOP specimen, gives a particular frame order.  
Which order is this? Is this the order of frames in the incoming data 
stream, before reording? That specimen seems to be in PTS order.  Is 
this necessary, or coincidental?

What reordering happens in the first step?  Is it reordering from 
incoming stream order to DTS order?

I don't get how the conveyor belt metaphor and illustrations add value. 
You could just show frames in DTS order, and say that the decoder 
operates on them in DTS sequence.  Maybe show the frames in DTS order, 
with arrows from each P-frame to the corresponding I-frame, and from 
each B-frame to the corresponding I-frame and P-frame.

Then show arrows from that sequence down to the same frames, in PTS order.

It is not clear to me why the final two B-frames have later DTSs than 
the following I-frame, but earlier PTSs.  Why would these B-frames not 
be relative to the first I-frame?  If they are relative to the second 
I-frame, why would that I-frame not have an earlier DTS?  Are the 
B-frames relative to the final P-frame before them?  What is going on 
visually that the encoder would choose to sequence things this way?

It is great to have a reference to the specification which you are 
illustrating, "ITU-T H.262 (02/2012)". It would be even better to have 
that at the beginning. The illustration might explain its goal, e.g. 
"This illustrates the Group of Pictures and frame reordering operations 
as described in ITU-T H.262 (02/2012)."

And, these diagrams are amazing works of character graphics. They would 
be even more amazing as works of vector graphics. But drawing them in 
vector graphics would require a different skill-set.

Keep up the good work!