[FFmpeg-devel] [GSoC] Motion Interpolation

Michael Niedermayer michael at niedermayer.cc
Mon Jun 20 13:02:21 CEST 2016

On Mon, Jun 20, 2016 at 09:54:15AM +0000, Davinder Singh wrote:
> On Sat, Jun 18, 2016 at 3:16 AM Michael Niedermayer <michael at niedermayer.cc>
> wrote:
> > On Fri, Jun 17, 2016 at 08:19:00AM +0000, Davinder Singh wrote:
> > [...]
> > > Yes, I did that, after understanding it completely. It now works with the
> > > motion vectors generated by mEstimate filter. Now I’m trying to improve
> > it
> > > based on this paper: Overlapped Block Motion Compensation: An
> > > Estimation-Theoretic Approach
> >
> > > <
> > http://citeseerx.ist.psu.edu/viewdoc/download?doi=
> > >
> >
> > this is 22 years old
> >
> >
> > > and
> > > this one: Window Motion Compensation
> > > <https://www.researchgate.net/publication/252182199>.Takes a lot of time
> >
> > this is 25 years old
> >
> > not saying old papers are bad, just that this represents the knowledge
> > of 20 years ago
> >
> > also its important to keep in mind that blind block matching of any
> > metric will not be enough. To find true motion the whole motion
> > vector fields of multiple frames will need to be considered
> >
> > For example a ball thrown accross the field of view entering and
> > exiting the picture needs to move smoothly and at the ends (in time)
> > there are frames without the ball then a frame with the ball
> > these 2 are not enough to interpolate the frames between as we have
> > just one location where the ball is. With the next frames though
> > we can find the motion trajectory of the ball and interpolate it end
> > to end
> >
> > I think papers which work on problems like this and also interpolation
> > of all the areas that end up overlapping and covering each other
> > like the backgroud behind the ball in that example would be better
> > starting points for implementing motion estiation because ultimatly
> > that is the kind of ME code we would like to have.
> > Block matching with various windows, OBMC, ... are all good but
> > if in our example the vectors for the ball or background are off that
> > will look rather bad with any motion compensation
> > So trying to move a bit toward this would make sense but first
> > having some motion estimation even really basic and dumb with
> > mc working in a testable filter (pair) should probably be done.
> > Iam just mentioning this as a bit of a preview of what i hope could
> > eventually be implemented, maybe this would be after GSoC but its
> > the kind of code needed to have really usable frame interpolation
> >
> >
> >
> > > reading them. I think we need to add new Raised Cosine window (weights)
> > > along with Linear Window (currently implemented). What do you say?
> >
> > i dont know, the windows used in snow are already the best of several
> > tried (for snow).
> > no great gains will be found by changing the OBMC window from snow.
> >
> >
> > >
> > > Also making mInterpolate work with variable macroblock size MC. The
> > current
> > > interpolation works without half pel accuracy, though.
> >
> > mcfps has fully working 1/4 pel OBMC code, that should be fine to be
> > used as is i think unless i miss something
> >
> > half pel is 20 years old, it is not usefull
> > multiple block sizes on the MC side should not really matter ATM
> > smaller blocks are a bit slower but first we should get the code
> > working, then working with good quality and then working fast.
> >
> > multiple block sizes may be usefull for the estimation side if it
> > improves estimation somehow.
> >
> > Can i see your current "work in progress" ?
> >
> >
> > [...]
> > > I’m moving estimation code to some new file motion_est.c file and the
> > > methods are shared by both mEstimate and mInterpolate filters. mEstimate
> > > store the MVs in frame’s side data for any other filter. Moreover, any
> > > other filter if need post processing on MVs it can directly use the
> > shared
> > > methods. But, mInterpolate use them internally, no saving in sidedata,
> > and
> > > saving unnecessary processing.
> >
> > This design sounds good
> >
> >
> > >
> > >
> > > Also, Paper [1] doesn’t uses window with OBMC at all. It just find normal
> > > average without weight. Perhaps to compare papers I either need to add
> > > multiple option for each setting or need to assign the algorithm as
> > > researcher’s name in filter options.
> >
> >
> >
> Paper [1] and [2] uses functions or do post processing on motion vectors,
> so needs fast ME algorithms, which currently I’m working on. [*M]
> Let me summarize the papers (from Email 1, this thread):
> Paper [1]: Zhai et al. (2005) A Low Complexity Motion Compensated Frame
> Interpolation Method
> [Quote]
> This paper propose a MCFI method intended for real time processing. It
> first examines the motion vectors in the bitstream [*1]. 8x8 block size is
> used rather than 16x16 as in most cases; Using smaller block size leads to
> denser motion field, so neighboring MVs are more highly correlated, so
> prediction is better. To reduce complexity, MVs in bitstream are utilized
> [*1]. But need to be filtered as not all of them represent true motion.
> They are grouped into “good vectors, can be used directly” and “bad
> vectors, need to find true motion”. For classification of MVs into groups,
> SAD and BAD is used. For an 8x8 block in to-be-interpolated frame F(in) we
> get motion vector MV of block at same location in current frame. If F(in)
> is exactly middle of F(prev) and F(cur), then MV/2 points to avblock in
> prev frame & -MV/2 points to a block in current frame from F(in). Then SAD
> & BAD of both of these blocks are compared to certain thresholds [*2],
> based on which blocks are classified. For bad ones, overlapped block
> bi-directional motion estimation (OBBME) is carried out to find true
> motion. In OBBME, the size of block in F(in) is enlarged to 12x12, then
> bi-directional ME is performed to get MV that minimizes the diff. between
> two block located at MV/2 & -MV/2 in F(prev) & F(cur) wrt current block in
> F(in). Diff is calc by eq (1) in Paper. Like in BMA, we can use any fast ME
> algo here [*M]. After this, there are still few MVs. For that post
> processing is performed on MVs that break the continuity. We calculate the
> variation of each motion vector and its neighboring MVs. If variation
> exceeds a certain threshold, the MV is regarded as a single bad motion
> vector and then vector median filtering is applied. It finds one vector
> among 8, that minimizes eq (2). Finally, OBMC is applied. No weights are
> used [*3]. Pixels are simple averages given by eq 4-6.
> [/Qoute]
> [*1] We can for now use motion vectors generated on filter side. As you
> suggested, later we can use decoder’s vectors.
> [*2] Threshold values are not given in paper :(
> [*3] Initially, we can test using the generated/refined motion vector field
> with the currently implemented window based OBMC. Later to reduce
> complexity we can use their method.
> Paper [2]: Choi et al. (2007) Motion-Compensated Frame Interpolation Using
> Bilateral Motion Estimation and Adaptive Overlapped Block Motion
> Compensation
> [Quote]
> This algorithm has four steps. First, we propose the bilateral ME scheme to
> obtain the motion field of an interpolated frame. Then, we partition a
> frame into several object regions by clustering MVs. We apply the
> variable-size block MC (VS-BMC) to object boundaries in order to
> reconstruct edge information with a higher quality. Finally, we use the
> adaptive overlapped block MC, which adjusts the coefficients of overlapped
> windows based on the reliabilities of neighboring MVs. The adaptive OBMC
> (AOBMC) can overcome the limitations of the conventional OBMC, such as
> over-smoothing and poor de-blocking.
> I. We perform bilateral ME which prevents overlapping and hole problem [*4]
> by estimating the motion vectors of interpolated frame directly. If the
> conventional BMA is used to find a block-wise motion vector field between
> the previous frame and the current frame, the motion trajectories may not
> cover all pixels in the interpolated frame, consequently yielding hole
> regions. In addition, multiple trajectories may pass through the same
> pixel, causing overlapping regions. Therefore, we should estimate the
> motion vectors for the blocks in the interpolated frame, instead of using
> the motion vectors between the previous frame and the current frame. In
> proposed Bilateral ME we obtain the MV by comparing a block at a shifted
> position in the F(prev) and another block at the opposite position in
> F(curr), by minimizing SAD [*5][*M]. Since there can be multiple
> trajectories through the current block, we impose a spatial smoothness
> constraint to improve robustness of ME. The SMD is calc which is avg.
> between abs. px. values at boundary of predicted and neighboring block. We
> find best MV by minimizing weighted sum of SAD & SMD given by eq (6).
> II. Then MBs are classified into clusters according to MVs. [TL;DR] First,
> all MBs are considered as single object and cluster center is set to avg.
> MV of blocks. If diff b/w block's MV and threshold T (=8), the block
> belongs to new object. The avg. MV of the blocks in new object is set as a
> new cluster center. Each cluster center is updated to the avg. of MVs in
> the cluster. Steps 2–4 are iteratively repeated until there is no change in
> the cluster centers. [/TL;DR]
> III. To express complex motions, we adopt VS-BMC to reconstruct boundary
> blocks. We adopt a quadtree-based VS-BMC, which divides an 8x8 boundary
> block into 4x4 or 2x2 sub-blocks. [*6] Then we find MVs for sub-block using
> SAD like before, if new MV is less than 1/4MV of orig. block, accept the
> division - iterate it by subdividing it furthermore, or terminate procedure.
> IV. Finally adaptive OBMC is used with window such as raised cosine [*7].
> Conventional OBMC can yield blurring or over-smoothing artifacts. AOBMC
> reconstructs the interpolated frame faithfully by controlling the weights
> of overlapping windows according to the reliabilities of MVs. See Fig. 6 &
> 7.
> [/Quote]

> [*M] The shared methods from motion_est.c will allow this without
> repetition of code.

Just keep in mind the motion estimation we have is a bit mpeg centric
so block sizes below 8x8 will not work with all the routines we have

> [*5] It is very similar to OBBME used in Paper [1] except the block size is
> not changed.
> [*6] This required the current mInterpolate code to support variable size

> [*7] We could use the linear window instead of raised cosine one. But too
> late, I already implemented it.


> Another interesting paper I found is 3D recursive search. It's little old
> but very popular. See images here:
> http://i65.tinypic.com/zkfgox.png
> http://i67.tinypic.com/2dihmb7.png
> http://i65.tinypic.com/rgw38n.png

one thing that is very noticable on this though is that what they
use as comparission (full search) in these 3 images is alot worse than
what modern encoders use (rate distortion based predictive zonal ME)
this shouldnt matter much but i wanted to point out that its not
possible from this to conclude how these relate to what a modern
video encoder would use as "full search"

> Paper [3]: de Haan et al. (1994) True Motion Estimation with 3D Recursive
> Search Block Matching
> (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=246088)
> Gonna read now. It has unusual notation.
> Once we implement these, then we can deal with objects entering or exiting
> the screen. I think it is hole or overlapping problem addressed in paper
> [2], several approaches have been proposed to handle it like median
> filtering, spatial interpolation (
> http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=389461) or MC using
> neighboring motion fields. Will look into it more. The hole or overlapping
> problem is handled by bilateral motion estimation used in paper [2] (*4).

> Also have to handle scene changing issues. I read in some paper that they

yes, scene changing will need to be handled too, it was a problem in
mcfps too
the quick solution is probably to just detect by some threshold that
there is a scene change and then set all MVs to 0,0 that will look
alot better than random bits of images randomly moving and merging
into each other

> are too computational expensive.
> Which one do you think we should start with? I think it should be 3DRS.
> 3DRS is fastest of these three. Paper 2 compares result of all these three.
> 3DRS is around 16fps, [1] is ~7fps. [2] is ~3fps. Paper 2 outperforms both
> of them.

is the full text of paper 2 available somewhere ?

also motion trajectories should be interpolated through more than
2 frames, i dont know if the quoted papers do that but
vf_mcfps already provides the framework for this (aka its neccessary
to have 2 future and past frames available)
a random paper which seems to compare linear vs cubic shows very
significant gains
I dont know if that paper is good or not but for example a ball (to
keep the example used previously) would move along the edges of a
polygon if linear MV interpolation is used, That might work with a
slow moving ball but a spinning wheel should be heavily distorted
with interpolation along straight lines.

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

it is not once nor twice but times without number that the same ideas make
their appearance in the world. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160620/3b0640c2/attachment.sig>

More information about the ffmpeg-devel mailing list