[MPlayer-dev-eng] New inverse-telecine filter

Wed Dec 3 21:48:01 CET 2003

On Wed, Dec 03, 2003 at 03:34:04AM -0600, Zoltan Hidvegi wrote:
> > On Tue, Dec 02, 2003 at 10:10:08AM -0600, Zoltan Hidvegi wrote:
> > > This is a new inverse telecine filter, similar to detc, ivtc and pullup,
> > > but better.  Here is a summary how it works.
> > 
> > The name is very misleading. It implies the filter is better than
> > pullup, which is certainly not true, since it's impossible to do
> > correct inverse telecine without buffering fields ahead. At the very
> > least, you need one field into the future. And there's a reason pullup
> > buffers 6 fields...
> 
> For mencoder-g1 it is better than pullup, since pullup cannot keep the
> framerate. 

I guess you mean force 24 fps output, and you're right. But it's still
not better to output incorrect frames. I can upload some samples where
your code will output incorrect frames if you'd like. :)

Anyway, I'll be happy to commit your filter to MPlayer, but the name
pullup2 is very misleading and I won't commit with that name. Pullup
is the engine I wrote independent of MPlayer to be very versatile and
reusable in other programs, and "pullup2" is not version 2 of that,
but a very different approach.

> The licomb metrics of
> pullup are much less accurate than the metrics I've used.  pullup
> takes one field, approximates the other and takes the difference, so
> it has to do a half-pixel shift for one field while leaving the other
> field in place.  It is more accurate to shift both fields by a quarter
> instead.  It's basically a [1 -3 3 -1] filter, instead of [-1 2 -1]
> used by licomb.  Normally for linearly approximating smooth functions
> the quarter shift has 4x less error than the half shift, but I have to
> do two quarter shifts, so it's just 2x better, but actually it happens
> to be better than that, since [1 -3 3 -1] is zero for quadratic
> functions, so the error is cubic. 

Agree totally. I'll change it.

> The frame-break detection algorithm
> of pullup did not seem to be that good, but that's certainly fixable.

Why do you say this? All the decisions are based on thorough
consideration of all the cases, so that it does not generate incorrect
output. The only thing I don't consider is nasty broadcast noise or
watermarks (imo a form of broadcast noise :).

> Also the licomb metric in pullup may have a bug, as it really looks at
> 16 lines (8 line pairs), I think it should really be 4 pairs.

Hmm, I'll check this!

> It may be true that you need lookahead if you want to be absolutely
> correct, but once you find a frame boundary, you can display the next
> two fields as long as those two fields seem to match well. 

The question is: can you find the frame boundary? Lots of times it's
harder than you'd expect. I imagine you're working with trivial
content (live action) rather than difficult stuff like animation.

> And for watching live TV
> it is an advantage that you have no delay, and it was one of my goals,
> to have a filter good for live TV.

I agree this is useful. But you absolutely need at least one field
delay or you _WILL_ mess up at scene change splices.

> An other problem I had, which may be just me not really understanding
> mplayer, is that pullup, detc, ivtc and many other filters call
> vf_get_image with mpi->width, mpi->height, even though other filters
> use mpi->w, mpi->h, and I think the later is the correct one.

Yes this is incorrect. :/ I'll fix it. Sometime.

> > The pullup module was already designed to handle all the buffer
> > management and field merging with minimal copy overhead. It's also
> 
> My pullup2 has zero copy for most frames, only the merged frames,
> usually 1 out of 4 are copied.  The pullup filter can do direct
> rendering, but not after a crop filter, which used EXPORT.  I'm

The solution is to set the boundary sizes for pullup to ignore and
then perform the actual crop after pullup. There's no way to do this
yet from the command line, but adding it is trivial.

> encoding HDTV content where much of the content is scaled up sdtv,
> with black ares on the left and the right, so I have to crop from
> 1920x1080 down to 1408x1052.  And crop does not pass the field flags,
> and my patch to pass those was not accepted.  But even if it is, after
> crop you cannot do direct rendering.  Maybe in G2 you can?  So
> actually my filter does less copy than pullup if you need to crop.

I disagree. It's never possible to direct render through crop since
the source filter doesn't have enough space to draw into. But in
practice it's not useful to direct render beyond this point anyway
since your buffers will be in (nonreadable) video memory, and even if
the codec doesn't have to read B frames, your inverse telecine engine
has to read all frames to make decisions.

With pullup, the drawing will be:

  codec --dr--> pullup --export--> vo
                  OR
  codec --dr--> pullup --export--> crop --export--> vo

This is optimal for vo's that write directly to video memory, and
considering that G1 doesn't allow you to get lots of buffers ahead
(and pullup needs to!), it's optimal for other vo's too.

> > meant to allow plugging in new decision algorithms, although this code
> > isn't in place yet. If you think you have better algorithms for
> > choosing how to put frames together, I would suggest making them
> > switchable in pullup, rather than writing an entirely new filter. We
> > already have too many separate filters for playing around with ivtc
> > algorithms...
> 
> I'll try to understand pullup better, but some parts of it, especially
> the buffer management, seemed to be more complicated than necessary,
> maybe because it is really for G2?

Because it has to buffer fields ahead to make accurate decision. Also
the API makes it possible to use field-at-a-time source instead of
frame-based source, in case your capture device or codec gives fields
instead of fake frames. And if you read the code a little more, I
think you'll see that factoring out the 'complicated' buffering layer
makes the actual decision-making code very concise and readable, and
it's naturally independent of input field order without any hacks.

> > > The filter works on
> > > even-first (a.k.a. top-field first) frames, for a bottom-first frame the
> > > first line is skipped to make it top-first.
> > 
> > This is rather awkward. Why not just invert stride and start at the
> > bottom?
> 
> Sure, that works too, but I do not see how adding the stride once is
> more awkward that reversing it?  You already skip the pixels close to
> the edges, skipping one more line at the top does not make any
> difference.  This is only for calculating the metrics.  I also skip on
> the left to make sure I start at an 8-byte aligned address to speed up
> MMX.

I'm not quite sure how your code works on mixed input where some input
frames are TFF and others BFF (mixed hard+soft telecine)...

> name for inter-frame and intra-frame licomb, and I was lazy to invent
> new names.  I know that for the pullup filter everything looks like a
> 60fps sequence, so there is no such difference.

Yeah. After all the mess of detc and ivtc, I think treating everything
as 60 field/sec sequences and ignoring input "frame" arrangement makes
it all much simpler.

> > > Using these statistics, the find_breaks routine tries to find frame
> > > breaks, see the code for details.  This routine is probably not perfect
> > > yet.
> > 
> > Again, you reuse terminology from the other filters to mean something
> > radically different.
> 
> I'm not that familiar with mplayer, so I was really not aware that I
> was confusing the terminology, please tell me what terminology I
> should use.

Maybe I should type my handwritten design docs for pullup and commit
to DOCS/tech... Basically, in my terminology, a break means a point
where fields necessarily can't go together. Breaks arise from
duplicate fields, so they only appear adjacent to duration-3 frames
with motion relative to the previous/next frame.

Pullup also has a notion of field "affinity", whereby each field is
said to "prefer" one or neither of its two potential partners
according to the "licomb" metric.

I then constructed a state machine by considering all the
possibilities for affinity and breaks and assuming they accurately
reflect what they're supposed to measure, when they are available.

...

Anyway, to summarize, you have some very nice ideas for improving
inverse telecine, but your code isn't and shouldn't be called pullup
v2. If you have another good name for it, and as long as it works when
I test it, I'll be happy to commit. I'd also like to take some of the
ideas you suggested for improving pullup, like fixing the bug in
licomb computation (if it exists) and using the qpel translate to
improve accuracy.

Cheers!

Rich