[MPlayer-dev-eng] More on inverse telecine!

Sun Apr 13 19:35:36 CEST 2003

On Sun, Apr 13, 2003 at 07:50:29AM -0500, Billy Biggs wrote:
> D Richard Felker III (dalias at aerifal.cx):
> 
> > I've been working on a new, much better inverse telecine filter to
> > replace vf_detc. So far, it's working quite well, but there are a few
> > nasty special cases it doesn't detect well, and which vf_detc didn't
> > detect very well either.
> > 
> > The most troubling one is the following: Suppose you have a still
> > picture that's scrolling vertically at almost exactly 1 pixel per
> > (24fps) frame. Now suppose this sequence has been telecined. In the
> > two interlaced frames out of a five-frame sequence, the upper and
> > lower fields are identical due to the scrolling. According to any sane
> > metric I can think of for measuring interlacing, this condition is the
> > most non-interlaced you can get, absolutely zero interlacing. However
> > to a human being both these frames look really awful and are obviously
> > the ones that need to be fixed by merging fields across frames.
> > 
> > Could anyone suggest to me a decent way to detect such a condition? If
> > I can find a solution to this problem that's not a huge nasty hack, I
> > think we'll finally have really good non-beta-ish inverse telecine!
> > the code so far is working on all the samples uploaded to mphq, except
> > a few broken ones, and working mostly very well on SE:Lain, which has
> > proved again and again to be much more difficult to de-telecine than I
> > ever originally imagined!
> 
>   I'm doing pulldown detection in tvtime now, so I'm back in the scene.
> When you come across the titles, shouldn't you maintain the current
> pulldown pattern so long as the results are consistent?  That is, I know
> what I try and do is, once I've found the phase, unless I get numbers
> that conflict with it, maintain the current phase.

That's what detc does. However, I've become very doubtful as to
whether this is a good idea... More later.

>   I'm curious about your difference metric too.  I've had some bad bad
> test cases lately.  If you don't mind, I'm going to port some of your
> code into tvtime and see what results I get.

(Feel free to port code as long as it's GPL! :)

Well there are several metrics I use. For each 8x8 block, I compute:

1) Sum abs(this frame's even lines - prev frame's even lines).
2) Sum abs(this frame's odd lines - prev frame's odd lines).

And several 'noise' figures, computed as the n=4 fourier cofficient of
each column, i.e. inner product with (+1,-1,+1,-1,+1,-1,+1,-1).

3) Noise of prev. frame.
4) Noise of current frame.
5) Noise when even field of current frame is merged with odd field of
   prev frame.

Then, once I have all these stats, I take the max of each individually
over all 8x8 blocks, and I take the max of the following quantities as
well:

even-odd
odd-even
current noise-merged noise
merged noise-current noise
prev noise-merged noise
merged noise-prev noise

One dead giveaway that a frame is part of the telecine pattern is when
the max even-odd is massively smaller than max odd-even. The
difference is often as significant as 80 vs. 3000!

Anyway, back to the point...

My old detc filter was a complicated state machine that predicted
everything based on past info and the contents of the current frame.
So if it thought a frame might be the first interlaced frame of a
telecine sequence, it would skip showing it, and wait for the next
frame to come in. Then, once it got that next frame, it had to check
whether that next frame was an 'ok match' for the skipped frame, and
merge them if so; otherwise break out of the pattern sync.

This worked ok with most content, but with really strange stuff like

>   BTW, something fun:
> 
>   http://sourceforge.net/mailarchive/forum.php?thread_id=1938923&forum_id=6588

it would have a hell of a time. The situation with Serial Experiments
Lain is really bad too -- you should try it for testing your code! :)
But even with normal, "well behaved" telecine, it's commonplace for
edits to have been made after the telecine process, and that means
sometimes a scene change will come right after the first telecine
frame.

This isn't so bad, since you can just drop the frame and go ahead with
the scene change, but there's also another nasty possibility! The
first frame after the scene change can be one of the interlaced
frames, and you have no previous reference frame to compare it against
to tell if it's interlaced!

Thus, I decided with my new filter to delay video output by one frame.
This way, I can use the metrics above in a much smarter way to decide
what to do with a frame:

If the current frame's odd field matches the next frame's even field
better than it matches its own, merge them! Otherwise just show it
normally. There are a few more tests using the even/odd stuff since it
tends to be a lot more reliable than the 'noise' numbers, but the
logic is so much simpler. Rather than having the complicated state
machine and having to worry about it getting synced to the wrong
pattern or keeping an old pattern after a scene change, the filter
simply detects *every* occurrance of interlacing. Or at least it's
supposed to. Right now, the only case I find it to be missing is that
horrible one-line-per-frame vertical scrolling.

So, that brings us back to my original question... :(

By the way, one more thing about sticking to the telecine pattern.
Lots of times, especially in anime, you have some motion, followed by
a period of total stillness. If you track the pattern and keep merging
frames during the stillness, you might actually reduce the picture
quality, due to quantization errors in the source file/dvd and slight
noise/discrepencies between the frames due to analog processing during
production. This won't hurt visually in most cases, but it does waste
bits if you'll be encoding.

So as long as my frame drop code keeps up with the pattern (dropping
the fifth frame after the previous one dropped), the results should
come out just as good as if we'd tracked the telecine pattern and used
it to merge/drop frames, and without impacting encoding.

OK, I need to run soon. Thanks for the input. I'll write more later.

Rich