[MPlayer-G2-dev] vo3

Sun Dec 28 01:31:36 CET 2003

On Sat, Dec 27, 2003 at 01:01:22AM +0200, Ivan Kalvachev wrote:
> Hi, here is some of my ideas,
> i'm afraid that there already too late to be implemented, as
> dalias is coding him pipeline system, while i have not finished the drafts
> already, but feel free to send coments...

Time to pour on the kerosine and light the flames... :)

First comment: putting everything in a PDF file makes it very
difficult to quote and reply to individual parts of your draft. I'll
try my best... [all following quotes are from the PDF]

> Here are few features that I'm trying to achieve:
>  decreasing memcpy by using one and same buffer by all filters that
> can do it (already done in g1 as Direct Rendering method 1)
>  support of partial rendering (slices and  DR m2)

These are obviously necessary for any system that's not going to be
completely unusably sucky. And they're already covered in G1 and G2
VP.

>  support   for   get/release   buffer   (ability   to   release
> buffers   when   they   are   no   longer needed)

This is not so obvious at first, but absolutely necessary for
overcoming bugs in G1 that prevented all but the simplest filters from
using buffer sharing/DR. It's also already covered in G2 VP -- in fact
it was one of the two key design points.

>  out of order rendering  ability to move the data through the video
> filters if there is no temporal dependency
>  display_order rendering  this is for filters that need to use
> temporal dependences

Ivan and I disagree greatly on the nature of these goals. To me,
they're a simple consequence of a natural way of thinking about frame
passing and slice rendering. To him, out-of-order is the fundamental
frame passing protocol, and special care is required for handling
frames in order.

>  ability to keep as many incoming images are needed and to output as
> many images as filter may need to (e.g. in case of motion blur we
> will have e.g. 6 incoming images and 6 outgoing at once)
>  support for PTS.

These were the primary motivation behind G2 VP.

>  ability to quickly reconfigure and if possible - to reuse data that
> is already processed (e.g. we have scale and the user resizes the
> image, - only images after scale will be redone),

In my design, this makes no sense. The final scale filter for resizing
would not pass any frames to the vo until time to display them.

> safe seeking, auto-insertion of filters.

What is safe-seeking?

Auto-insertion is of course covered.

>  ability to have more complicated graph (than simple chain) for
> processing.

This is definitely a desirable goal.

>  simple structure and flexible design.

IMNSHO the out-of-order stuff in Ivan's design is anything but simple.

> In short the ideas used are :
>  common buffer and separate mpi  already exist in g1 in some form
>  counting buffer usage by mpi and freeing after not used  huh,
> sound like java :O

No. Reference counting is good. GC is idiotic. And you should never
free buffers anyway until close, just repool them.

>  allocating all mpi&buffer before starting drawing (look obvious,
> doesn't it?)  in G1 filters had to copy frames in its own buffers
> or play hazard by using buffers out of their scope

Yes, maybe G1 was broken. All the codecs/filters I know allocate
mpi/buffers before drawing, though.

>  using flag IN_ORDER, to indicate that these frames are "drawn" and
> there won't come frames with "earlier" PTS.

I find this really ugly.

>  using common function for processing frame and slices  to make
> slice support more easier

This can easily be done at the filter implementation level, if
possible. In many cases, it's not. Processing the image _contents_ and
the _frame_ are two distinct tasks.

>  emulating complicated graph in a simple linked list.

This sounds like an ugly hack.

>  messaging system for dropping/rebuilding MPI's.(not yet finished)

Very bad.

>  having prepared simple type filters (like non temporal  one
> input/one output, processing the frame as it came, without carre for
> buffer management) (not documented)

Also provided for in G2 VP.

> [...]
> So, the frame is split on 2 parts, one I will call mpi and the other
> I will call mp_buffer. The mp_buffer part contains the memory
> buffer, usage count, buffer common width and height, maybe stride.
> The mp_buffer->count is the number of MPI-s that point to that
> buffer. Probably we may allow buffer to contain more than one piace
> if memory (e.g. 3 memory blocks for Y,U,V planes).

An idea like this was already suggested by Arpi and adopted in G2 VP,
but not as extreme. The reason for not making such a sharp division is
that the owner of the buffer will often _need_ to know about the
buffer's status as the contents of a given frame, not just which
buffer it is.

One thing omitted in G2 so far is allowing for mixed buffer types,
where different planes are allocated by different parties. For
example, exporting U and V planes unchanged and direct rendering a new
Y plane. I'm not sure if it's worth supporting this, since it would be
excessively complicated. However, it would greatly speed up certain
filters such as equalizer.

> [...]
> This scheme also allows to get rid of the static buffer type. Simply
> the decoder will never release it's mpi, but will pass it to the
> filter chain, multiple times (like ffmpeg's reuse). On the other
> side static buffers should always be in the main memory, otherwise
> they can take the only display buffer and stale displaying (e.g. vo
> with one buffer, and decoder with 2 static buffers)

This is the same principle as the REUSABLE flag in G2 VP, except that
DR buffers are also allowed to be reusable in my design.

> Dalias already pointed that processing may not be strictly top from
> bottom, may not be line, slice, or blocks based. This question is
> still open for discussion. Anyway the most flexible x,y,w,h way
> proved to be also the most hardier and totally painful. Just take a
> look of crop or expand filters in G1. More over the current G1
> scheme have some major flaws:
>  the drawn rectangles may overlap (it depends only on decoder)

No, my spec says that draw_slice/commit_slice must be called exactly
once for each pixel. If your codec is broken and does not honor this,
you must wrap it or else not use slices.

>  drawing could be done in any order. This makes it very hard to say
> in what part of the image is already processed

I agree, it's very ugly. IMO there should at least be certain minimal
restrictions on slice structure, but I don't know what they should be.
In any case, I don't like Ivan's idea of restricting slices to
macroblock-high horizontal strips drawn in-order from top to bottom...
Certainly broken codecs like VP3 will want to draw bottom-to-top.

>  skipped_blocks processing is very hard. Theoretically it is
> possible to draw only non- skipped blocks, but then the above
> problem raise.

I would _really_ like a clean solution to skipped_blocks processing.
It's the final key to speed which we haven't solved... :(

> The main problem is the out-of-order rendering. The filters should
> be able to process, the frames in the order they came. On another
> side there are some filters that can operate only in display order.
> So what is the solution?
> 
> By design the new video system requires PTS (picture time stamp). I

PTS stands for PRESENTATION time stamp, not picture time stamp.

> add new flag that I call IN_ORDER. This flag indicates that all
> frames before this one are already available in the
> in-coming/out-coming area. Lets make an example with MPEG IPB order.
> 
> We have deciding order IPB and display IBP.
> First we have I frame. We decode it first and we output it to the
> filters. This frame is in order so the flag should be set for it
> (while processing). Then we have P-Frame. We decode it, but we do
> not set the flag (yet). We process the P-Frame too. Then we decode
> an B-Frame that depends on the previous I and P Frames. This B-Frame
> is in order when we process it. After we finish with the B-Frame(s)
> the first P-Frame is in order.

This idea is totally broken, as explained by Michael on ffmpeg-devel.
It makes it impossible for anything except an insanely fast computer
to play files with B frames!! Here's the problem:

1. You decode first I frame, IN_ORDER.
2. You display the I frame.
3. You decode the P frame. Not IN_ORDER.
4. You decode the B frame. IN_ORDER.
5. You display the B frame, but only after wasting >frametime seconds,
   thus causing A/V desync!!
6. The P frame becomes IN_ORDER.
7. You display the P frame.
8. Process repeats.

The only solution is to always impose one-frame delay at the _decoder_
end when decoding files with B frames. In Ivan's design, this can be
imposed by waiting to set the IN_ORDER flag for an I/P frame until the
next B frame is decoded.

>  As you can see it is very easy for the decoders to set the IN_ORDER
> flag, it could be done om G1's decode() end, when the frames are in
> order.

Actually, this is totally false. Libavcodec does _not_ export any
information which allows the caller to know if the frames are being
decoded in order or not. :( Yes, this means lavc is horribly broken...

> If an MPI is freed without setting IN_ORDER then we could guess that
> it have been skipped.

Frame sources cannot be allowed to skip frames. Only the destination
requesting frames can skip them.

> Skipping/Rebuilding

This entire section should be trashed. It's very bad design.

> Now the skipping issue is rising. I propose 2 flags, that should be
> added like IN_ORDER flag, I call them SKIPPED and REBUILD. I thought
> about one common INVALID, but it would have different meening
> depending from the array it resides (incoming or outgoing)
> 
> SKIPPED is requared when a get_image frame is gotten but the
> processing is not performed. The first filter sets this flag in the
> outgoing mpi, and when next filter process the date, if should free
> the mpi (that is now in the incoming). If the filter had allocated
> another frame, where the skipped frame should have been draw, then
> it can free it by setting it as SKIPPED.

Turn things around in the only direction that works, and you don't
need an image flag for SKIPPED at all. The filter _requesting_ the
image knows if it intends to use the contents or not, so if not, it
just ignores what's there. There IS NO CORRECT WAY to frameskip from
the source side.

> E.g. if we have this chain 
> -vf crop=720:540,spp=5:4,scale=512:384
> This chain should give quite a trill to 2GHz processor. Now imagine
> that scale is auto inserted and that the vo is some window RGB only
> device (vo_x11). If a user change the window size, scale parameters
> change too. Scale should rebuild all frames that are processed, but
> now shown. Scale filter can safely SKIP all frames in the outgoing.

Bad point 1: manually created filters which have been given parameters
MUST NEVER auto-reconfigure. In my design, if the user enabled dynamic
window rescaling, another scale filter controlled by the UI layer
would get inserted, and activated only when the window size was
non-default.

Bad point 2: your "rebuild" idea is not possible. Suppose the scale
filter has stored its output in video memory, and its input has
already been freed/overwritten. If you don't allow for this,
performance will suck.

> [...]
> -vf spp=5,scale=512:384,osd
> [...]
> Now the user turns off OSD that have been already rendered into a
> frame. Then vf_osd set REBUILD for all affected frames in the
> incoming array. The scale filter will draw the frame again, but it
> won't call spp again. And this gives a big win because vf_spp could
> be extremly slow.

This is stupid. We have a much better design for osd: as it
slice-renders its output, it makes backups (in very efficient form)
of the data that's destroyed by overwriting/alphablending. It can then
undo the process at any time, without ever reading from its old input
buffers or output buffers. In fact, it can handle slices of any shape
and size, too!

> On another side, there is one big problem  the mpi could already be
> freed by the previous filter. To workaround it we may need to keep
> all buffers until the image is shown (something like
> control(FLIP,pts) for all filters). Same thing may be used on seek,
> to flush the buffers.

This is an insurmountible problem. The buffers will very likely no
longer exist. Forcing them to be kept will destroy performance.

> Problems remaining!

Lots more than you itemize!

> 1. Interlacing  should the second field have its own PTS?

In principle, definitely yes. IMO the easiest way to handle it is to
require codecs that output interlaced video to set the duration field,
and then pts of the second field is just pts+duration/2.

> P.S.
>  I absolutely forbid this document to be published anywhere. It is
> only for mplayer developers' eyes. And please somebody to remove the
> very old vo2 drafts, from the g1 CVS.

Then don't send it to public mailing lists... :)

Sorry but IMO it's impossible to properly respond/comment without
quoting large sections.

So, despite all the flames, I think there _are_ a few realy good ideas
here, at least as far as deficiencies in G1 (or even G2 VP) which we
need to resolve. But I don't like Ivan's push-based out-of-order
rendering pipeline at all. It's highly non-intuitive, and maybe even
restrictive.

Actually, the name (VO3) reflects what I don't like about it: Ivan's
design is an api for the codec to _output_ slices, thus calling it
video output. (In fact, all filter execution is initiated from within
the codec's slice callback!) On the other hand, I'm looking for an API
for _obtaining_ frames to show on a display, which might come from
anywhere -- not just a codec. For instance they might even be
generated by visualization plugins from audio data, or even from
/dev/urandom! My design makes the source of the video totally
transparent, rather than making the source the entry point for
everything! And, my design separates image content processing (which
might be able to happen out-of-order) from frame processing (which
always happens in order).

So, Ivan. I'll try to take the best parts of what you've proposed and
incorporate them into the code for G2. Maybe we'll be able to find
something we're both happy with.

With kind flames,
Rich