[FFmpeg-devel] [GSoC] Motion Interpolation

Wed Jun 22 22:37:32 CEST 2016

On Mon, Jun 20, 2016 at 4:33 PM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> On Mon, Jun 20, 2016 at 09:54:15AM +0000, Davinder Singh wrote:
> > On Sat, Jun 18, 2016 at 3:16 AM Michael Niedermayer
> <michael at niedermayer.cc>
> > wrote:
> >
> > > On Fri, Jun 17, 2016 at 08:19:00AM +0000, Davinder Singh wrote:
> > > [...]
> > > > Yes, I did that, after understanding it completely. It now works
> with the
> > > > motion vectors generated by mEstimate filter. Now I’m trying to
> improve
> > > it
> > > > based on this paper: Overlapped Block Motion Compensation: An
> > > > Estimation-Theoretic Approach
> > >
> > > > <
> > >
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.112.8359&rep=rep1&type=pdf
> > > >
> > >
> > > this is 22 years old
> > >
> > >
> > > > and
> > > > this one: Window Motion Compensation
> > > > <https://www.researchgate.net/publication/252182199>.Takes a lot of
> time
> > >
> > > this is 25 years old
> > >
> > > not saying old papers are bad, just that this represents the knowledge
> > > of 20 years ago
> > >
> > > also its important to keep in mind that blind block matching of any
> > > metric will not be enough. To find true motion the whole motion
> > > vector fields of multiple frames will need to be considered
> > >
> > > For example a ball thrown accross the field of view entering and
> > > exiting the picture needs to move smoothly and at the ends (in time)
> > > there are frames without the ball then a frame with the ball
> > > these 2 are not enough to interpolate the frames between as we have
> > > just one location where the ball is. With the next frames though
> > > we can find the motion trajectory of the ball and interpolate it end
> > > to end
> > >
> > > I think papers which work on problems like this and also interpolation
> > > of all the areas that end up overlapping and covering each other
> > > like the backgroud behind the ball in that example would be better
> > > starting points for implementing motion estiation because ultimatly
> > > that is the kind of ME code we would like to have.
> > > Block matching with various windows, OBMC, ... are all good but
> > > if in our example the vectors for the ball or background are off that
> > > will look rather bad with any motion compensation
> > > So trying to move a bit toward this would make sense but first
> > > having some motion estimation even really basic and dumb with
> > > mc working in a testable filter (pair) should probably be done.
> > > Iam just mentioning this as a bit of a preview of what i hope could
> > > eventually be implemented, maybe this would be after GSoC but its
> > > the kind of code needed to have really usable frame interpolation
> > >
> > >
> > >
> > > > reading them. I think we need to add new Raised Cosine window
> (weights)
> > > > along with Linear Window (currently implemented). What do you say?
> > >
> > > i dont know, the windows used in snow are already the best of several
> > > tried (for snow).
> > > no great gains will be found by changing the OBMC window from snow.
> > >
> > >
> > > >
> > > > Also making mInterpolate work with variable macroblock size MC. The
> > > current
> > > > interpolation works without half pel accuracy, though.
> > >
> > > mcfps has fully working 1/4 pel OBMC code, that should be fine to be
> > > used as is i think unless i miss something
> > >
> > > half pel is 20 years old, it is not usefull
> > > multiple block sizes on the MC side should not really matter ATM
> > > smaller blocks are a bit slower but first we should get the code
> > > working, then working with good quality and then working fast.
> > >
> > > multiple block sizes may be usefull for the estimation side if it
> > > improves estimation somehow.
> > >
> > > Can i see your current "work in progress" ?
> > >
> > >
> > > [...]
> > > > I’m moving estimation code to some new file motion_est.c file and the
> > > > methods are shared by both mEstimate and mInterpolate filters.
> mEstimate
> > > > store the MVs in frame’s side data for any other filter. Moreover,
> any
> > > > other filter if need post processing on MVs it can directly use the
> > > shared
> > > > methods. But, mInterpolate use them internally, no saving in
> sidedata,
> > > and
> > > > saving unnecessary processing.
> > >
> > > This design sounds good
> > >
> > >
> > > >
> > > >
> > > > Also, Paper [1] doesn’t uses window with OBMC at all. It just find
> normal
> > > > average without weight. Perhaps to compare papers I either need to
> add
> > > > multiple option for each setting or need to assign the algorithm as
> > > > researcher’s name in filter options.
> > >
> > >
> > >
> > Paper [1] and [2] uses functions or do post processing on motion vectors,
> > so needs fast ME algorithms, which currently I’m working on. [*M]
> >
> > Let me summarize the papers (from Email 1, this thread):
> >
> > Paper [1]: Zhai et al. (2005) A Low Complexity Motion Compensated Frame
> > Interpolation Method
> >
> > [Quote]
> > This paper propose a MCFI method intended for real time processing. It
> > first examines the motion vectors in the bitstream [*1]. 8x8 block size
> is
> > used rather than 16x16 as in most cases; Using smaller block size leads
> to
> > denser motion field, so neighboring MVs are more highly correlated, so
> > prediction is better. To reduce complexity, MVs in bitstream are utilized
> > [*1]. But need to be filtered as not all of them represent true motion.
> > They are grouped into “good vectors, can be used directly” and “bad
> > vectors, need to find true motion”. For classification of MVs into
> groups,
> > SAD and BAD is used. For an 8x8 block in to-be-interpolated frame F(in)
> we
> > get motion vector MV of block at same location in current frame. If F(in)
> > is exactly middle of F(prev) and F(cur), then MV/2 points to avblock in
> > prev frame & -MV/2 points to a block in current frame from F(in). Then
> SAD
> > & BAD of both of these blocks are compared to certain thresholds [*2],
> > based on which blocks are classified. For bad ones, overlapped block
> > bi-directional motion estimation (OBBME) is carried out to find true
> > motion. In OBBME, the size of block in F(in) is enlarged to 12x12, then
> > bi-directional ME is performed to get MV that minimizes the diff. between
> > two block located at MV/2 & -MV/2 in F(prev) & F(cur) wrt current block
> in
> > F(in). Diff is calc by eq (1) in Paper. Like in BMA, we can use any fast
> ME
> > algo here [*M]. After this, there are still few MVs. For that post
> > processing is performed on MVs that break the continuity. We calculate
> the
> > variation of each motion vector and its neighboring MVs. If variation
> > exceeds a certain threshold, the MV is regarded as a single bad motion
> > vector and then vector median filtering is applied. It finds one vector
> > among 8, that minimizes eq (2). Finally, OBMC is applied. No weights are
> > used [*3]. Pixels are simple averages given by eq 4-6.
> > [/Qoute]
> >
> > [*1] We can for now use motion vectors generated on filter side. As you
> > suggested, later we can use decoder’s vectors.
> > [*2] Threshold values are not given in paper :(
> > [*3] Initially, we can test using the generated/refined motion vector
> field
> > with the currently implemented window based OBMC. Later to reduce
> > complexity we can use their method.
> >
> >
> > Paper [2]: Choi et al. (2007) Motion-Compensated Frame Interpolation
> Using
> > Bilateral Motion Estimation and Adaptive Overlapped Block Motion
> > Compensation
> >
> > [Quote]
> > This algorithm has four steps. First, we propose the bilateral ME scheme
> to
> > obtain the motion field of an interpolated frame. Then, we partition a
> > frame into several object regions by clustering MVs. We apply the
> > variable-size block MC (VS-BMC) to object boundaries in order to
> > reconstruct edge information with a higher quality. Finally, we use the
> > adaptive overlapped block MC, which adjusts the coefficients of
> overlapped
> > windows based on the reliabilities of neighboring MVs. The adaptive OBMC
> > (AOBMC) can overcome the limitations of the conventional OBMC, such as
> > over-smoothing and poor de-blocking.
> > I. We perform bilateral ME which prevents overlapping and hole problem
> [*4]
> > by estimating the motion vectors of interpolated frame directly. If the
> > conventional BMA is used to find a block-wise motion vector field between
> > the previous frame and the current frame, the motion trajectories may not
> > cover all pixels in the interpolated frame, consequently yielding hole
> > regions. In addition, multiple trajectories may pass through the same
> > pixel, causing overlapping regions. Therefore, we should estimate the
> > motion vectors for the blocks in the interpolated frame, instead of using
> > the motion vectors between the previous frame and the current frame. In
> > proposed Bilateral ME we obtain the MV by comparing a block at a shifted
> > position in the F(prev) and another block at the opposite position in
> > F(curr), by minimizing SAD [*5][*M]. Since there can be multiple
> > trajectories through the current block, we impose a spatial smoothness
> > constraint to improve robustness of ME. The SMD is calc which is avg.
> > between abs. px. values at boundary of predicted and neighboring block.
> We
> > find best MV by minimizing weighted sum of SAD & SMD given by eq (6).
> > II. Then MBs are classified into clusters according to MVs. [TL;DR]
> First,
> > all MBs are considered as single object and cluster center is set to avg.
> > MV of blocks. If diff b/w block's MV and threshold T (=8), the block
> > belongs to new object. The avg. MV of the blocks in new object is set as
> a
> > new cluster center. Each cluster center is updated to the avg. of MVs in
> > the cluster. Steps 2–4 are iteratively repeated until there is no change
> in
> > the cluster centers. [/TL;DR]
> > III. To express complex motions, we adopt VS-BMC to reconstruct boundary
> > blocks. We adopt a quadtree-based VS-BMC, which divides an 8x8 boundary
> > block into 4x4 or 2x2 sub-blocks. [*6] Then we find MVs for sub-block
> using
> > SAD like before, if new MV is less than 1/4MV of orig. block, accept the
> > division - iterate it by subdividing it furthermore, or terminate
> procedure.
> > IV. Finally adaptive OBMC is used with window such as raised cosine [*7].
> > Conventional OBMC can yield blurring or over-smoothing artifacts. AOBMC
> > reconstructs the interpolated frame faithfully by controlling the weights
> > of overlapping windows according to the reliabilities of MVs. See Fig. 6
> &
> > 7.
> > [/Quote]
> >
>
> > [*M] The shared methods from motion_est.c will allow this without
> > repetition of code.
>
> Just keep in mind the motion estimation we have is a bit mpeg centric
> so block sizes below 8x8 will not work with all the routines we have
>
>
> > [*5] It is very similar to OBBME used in Paper [1] except the block size
> is
> > not changed.
> > [*6] This required the current mInterpolate code to support variable size
> > OBMC.
>
> > [*7] We could use the linear window instead of raised cosine one. But too
> > late, I already implemented it.
>
> :)
>
>
> >
> > Another interesting paper I found is 3D recursive search. It's little old
> > but very popular. See images here:
> > http://i65.tinypic.com/zkfgox.png
> > http://i67.tinypic.com/2dihmb7.png
> > http://i65.tinypic.com/rgw38n.png
>
> interresting
> one thing that is very noticable on this though is that what they
> use as comparission (full search) in these 3 images is alot worse than
> what modern encoders use (rate distortion based predictive zonal ME)
> this shouldnt matter much but i wanted to point out that its not
> possible from this to conclude how these relate to what a modern
> video encoder would use as "full search"
>

all the videos they use in papers are available here:
https://media.xiph.org/video/derf/
can be used to compare 3DRS v/s MVs generated by EPZS in modern codec.
+export_mvs can be of EPZS?

>
> >
> > Paper [3]: de Haan et al. (1994) True Motion Estimation with 3D Recursive
> > Search Block Matching
> > (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=246088)
> > Gonna read now. It has unusual notation.
> >
> > Once we implement these, then we can deal with objects entering or
> exiting
> > the screen. I think it is hole or overlapping problem addressed in paper
> > [2], several approaches have been proposed to handle it like median
> > filtering, spatial interpolation (
> > http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=389461) or MC using
> > neighboring motion fields. Will look into it more. The hole or
> overlapping
> > problem is handled by bilateral motion estimation used in paper [2] (*4).
>
> > Also have to handle scene changing issues. I read in some paper that they
>
> yes, scene changing will need to be handled too, it was a problem in
> mcfps too
> the quick solution is probably to just detect by some threshold that
> there is a scene change and then set all MVs to 0,0 that will look
> alot better than random bits of images randomly moving and merging
> into each other
>
>
> > are too computational expensive.
> >
> > Which one do you think we should start with? I think it should be 3DRS.
> > 3DRS is fastest of these three. Paper 2 compares result of all these
> three.
> > 3DRS is around 16fps, [1] is ~7fps. [2] is ~3fps. Paper 2 outperforms
> both
> > of them.
>
> is the full text of paper 2 available somewhere ?
>

http://www.mediafire.com/?nxmx358680k0d90
couldn't find original link

also motion trajectories should be interpolated through more than
> 2 frames, i dont know if the quoted papers do that but
> vf_mcfps already provides the framework for this (aka its neccessary
> to have 2 future and past frames available)
> a random paper which seems to compare linear vs cubic shows very
> significant gains
> http://www.ripublication.com/ijaer16/ijaerv11n10_42.pdf
> I dont know if that paper is good or not but for example a ball (to
> keep the example used previously) would move along the edges of a
> polygon if linear MV interpolation is used, That might work with a
> slow moving ball but a spinning wheel should be heavily distorted
> with interpolation along straight lines.
>
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> it is not once nor twice but times without number that the same ideas make
> their appearance in the world. -- Aristotle
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>