[FFmpeg-user] Illustration review, Streams and GOP Frame Reordering [was: a couple of things to look at]

Mon Mar 4 02:11:11 EET 2024

On 03/03/2024 18.33, Jim DeLaHunt wrote:
> Mark:
> 
> On 2024-03-02 19:51, Mark Filipak wrote:
>> I have a couple of things to look at.
>>
>> https://markfilipak.github.io/Video-Object-Notation/Streams.html
>> https://markfilipak.github.io/Video-Object-Notation/GOP%20%26%20Frame%20Reordering.html
>>
>> Comments are welcome. Please be brutal. 'Streams' is crucial.
> 
> Good work!

Thank you.

> Regarding the Streams illustration <https://markfilipak.github.io/Video-Object-Notation/Streams.html>:
> 
> The macroblock to slice to picdata transition is clear. Showing 45 macroblocks in a horizontal slice 
> works.  Good work.
> 
>... It is hard to count out 45 from single-digit numbers. 00..44 would be much clearer.

I agree, and I would have "0..44" if I could. If I used 2-digit numbers, I'd have to almost double 
the table width. The issue is that FireFox doesn't support 'font-size' style, so making the font 
smaller to fit can't be done.

> The complete list of 0..29 slices is visually overwhelming, and not necessary.  I think you could 
> keep slices 0..2, elide slices 3..27 with a vertical ellipsis, and keep slices 28..29.  That would 
> get the slice structure across.

I'm going for visual impact, too. Do you find what I have confusing?

> The slice structure lacks a comment with size, of the sort you included for macroblock and picdata. 
> The full slice structure does not leave any room for such a comment.

Well, I felt that with all 30 slices and all 1350 macroblocks explicitly shown, comments were 
superfluous. They will get looked at one time, then ignored for the rest of time.

> Regarding the GOP & Frame Reordering illustration, 
> <https://markfilipak.github.io/Video-Object-Notation/GOP%20%26%20Frame%20Reordering.html>:
> 
> Time is plastic in illustration space also. You have term definitions which happen after the first 
> use of those terms. It would be easier to follow if the term definitions could come at first use.
> 
> The opening text, "an I-frame followed by P-frames and optional B-frames", could be improved by 
> adding term definitions. e.g. "an I-frame (complete unto itself, sometimes called keyframe) followed 
> by P-frames (predictive based on differences with the preceding I-frame) and optional B-frames 
> (bipredictive based on differences with the preceding P-frame and I-frame)".

Thanks, Jim. That's your style.

> The first rectangle, GOP specimen, gives a particular frame order. Which order is this? Is this the 
> order of frames in the incoming data stream, before reording? That specimen seems to be in PTS 
> order.  Is this necessary, or coincidental?

Yes, frames in the stream are in PTS order.

> What reordering happens in the first step?  Is it reordering from incoming stream order to DTS order?

Yes.

> I don't get how the conveyor belt metaphor and illustrations add value.

They can easily be visualized and they are memorable.

> Then show arrows from that sequence down to the same frames, in PTS order.
> 
> It is not clear to me why the final two B-frames have later DTSs than the following I-frame, but 
> earlier PTSs.  Why would these B-frames not be relative to the first I-frame?

They are between the last P-frame and the next I-frame of the next GOP. They have no relation to the 
I-frame back at the beginning of their own GOP other than through the P-frame.

> If they are relative to the second I-frame, why would that I-frame not have an earlier DTS?

It does. When reordered, the next GOP's I-frame is decoded before the previous GOP's B-frames. You 
see that every time in every video that has B-frames.

> Are the B-frames relative to the final P-frame before them?

To my understanding, yes. That's what the page is about.

> What is going on visually that the encoder would choose to sequence things this way?

To my understanding, it's complying with the specifications: MPEG ISO & ITU.

> It is great to have a reference to the specification which you are illustrating, "ITU-T H.262 
> (02/2012)". It would be even better to have that at the beginning. The illustration might explain 
> its goal, e.g. "This illustrates the Group of Pictures and frame reordering operations as described 
> in ITU-T H.262 (02/2012)."

It's a matter of writing style. I prefer to not justify something until after I've said it, if at all.

> And, these diagrams are amazing works of character graphics. They would be even more amazing as 
> works of vector graphics. But drawing them in vector graphics would require a different skill-set.

They can be swipe-copied and pasted as plain text. You can't do that with either tables or vector 
graphics. I consider that important.

Thanks for your thoughts, Jim
--Mark.