[MPlayer-users] -vf ilpack

Thu Mar 11 16:47:06 CET 2004

On Tue, Mar 09, 2004 at 12:32:01AM -0500, D Richard Felker III wrote:

> > Algorithms exist to
> > shorten audio without changing the pitch so that only the tempo of
> > the music would change. I actually believe that it is the usual way
> > to convert film to PAL nowadays.
> 
> I don't think so. These butcher the quality even more.

When I compared the audio of PAL and NTSC DVD-versions of the same movie
(Star Trek Nemesis), I couldn't hear any difference in pitch. The PAL
version was shorter, so it was the audio that had been shortened instead
of adding extra fields to video.

> > If even that is unacceptable then there is still one option: The video
> > could be converted from 24 to 25 fps with motion compensated temporal
> > resampling. Hardware to do that is probably not cheap, but if I'm not
> > completely mislead, it at least exists.
> 
> I think you're completely misled. :)

How do they do the NTSC to PAL conversion then? Any NTSC originated
interlaced video material on PAL TV must have been resampled from 60 to 50
fields per second and if it's done by dropping fields, then the result
would be uneven motion. If it's done with non-motion-compensated temporal
inperpolation, then there must be ghosting. I haven't noticed either of
those artifacts on TV (actually I have, but from clearly different
origins), so the big boys in the TV broadcasting companies must use
either motion-compensated temporal resampling or magic. The former would
be somewhat easier to believe than the latter.

> > Another similar problem is to fix progressive material that has been
> > NTSC-telecined and then deinterlaced. If the deinterlacing was done
> > by a method that uses just one field and invents the other, then the
> > fix is to simply drop the duplicate frame, but if the fields were
> > blended together, then the reconstruction of the frame that was split
> > between the interlaced frames is very similar to the ghost removal
> > problem.
> 
> Yep. I wrote some emails (or irc discussion) on this topic a while
> back, and actually worked out the math to do it.

I also worked it out. However I never implemented it. At the time I did
it, I had a big bunch of CIF mpegs captured from telecined source, which
I wanted to transcode to mpeg4. Some were captured using one field only
and some by blending the fields together. Most were one field only, so
I did implement a filter to detect the pattern and drop the duplicate
frames. I didn't have enough energy to implement the other filter just
for a few files and the uneven motion was less pronounced in the blended
files anyway, so I kept them at 29.97 fps.

> (It's more complicated than you think because you also have to DETECT
> the pattern, which is hard enough without the blend applied...)

Been there - done that. I know exactly how complicated the detection is,
because I needesd it for my drop-every-fifth-frame filter. My mpegs
were quite noisy, so I had to develop pretty fool proof solution and
the resulting detection seems to work quite reliably for the blended
files too dropping one of the originally interlaced frames. It could
easily be made even more reliable by tweaking it specifically for the
blended case.

My detection algorith requires two passes. First I encode the video
uncompressed to /dev/null running the first pass of the filter. It
measures the difference between each pair of frames using differencing
code stolen from the decimate filter and writes the results to a log file.

On the final encoding I use the filter in second pass mode where it first
analyzes the whole log file to determine the phase of the pattern for
each 5 frame slice of the video. First it determines the phase just for
the areas were the pattern is strong enough to be pretty sure. Then it
interpolates the gaps assuming that if the phase is same on each end
of the gap, it probably stays constant over the gap. If the phase
changes during the gap, it analyzes the gap and determines the most
probable position for the phase change. If the beginning or end of
the video are within gaps, then it extrapolates the first or last
known phase. The filter then uses this data during the encoding to
drop one frame from each 5 frame slice.

Even if the gap areas get guessed wrong, the dropping of the wrong frame
is not as bas as one could think, because those parts of the video are
very low motion or very dark anyway, so any frame could be dropped
without making much difference.

The filter worked perfectly with -nosound, but I couldn't get mencoder
to do its frame duplication/skipping for audio sync consistently between
the two passes, so my final solution was to measure the frame differences
on the second pass too and compare them to log data so I could immediately
correct any mis-syncs that resulted from differently skipped or duplicated
frames.

-- 
 Ville