[FFmpeg-devel] [PATCH] avcodec/mlp*: improvements

Mon Oct 30 15:14:53 EET 2023

ons 2023-10-25 klockan 21:59 +0200 skrev Paul B Mahol:
> On Wed, Oct 25, 2023 at 9:03 PM Tomas Härdin <git at haerdin.se> wrote:
> 
> > On Wed, 2023-10-25 at 21:00 +0200, Paul B Mahol wrote:
> > > On Wed, Oct 25, 2023 at 8:39 PM Tomas Härdin <git at haerdin.se>
> > > wrote:
> > > 
> > > > 
> > > > >             if (c) {
> > > > >                 e[0] = 1 << 14;
> > > > >                 e[1] = 0 << 14;
> > > > >                 e[2] = v[1];
> > > > >                 e[3] = v[0];
> > > > >             } else {
> > > > >                 e[0] = v[0];
> > > > >                 e[1] = v[1];
> > > > >                 e[2] = 0 << 14;
> > > > >                 e[3] = 1 << 14;
> > > > >             }
> > > > > 
> > > > >             if (invert2x2(e, d)) {
> > > > >                 sum = UINT64_MAX;
> > > > >                 goto next;
> > > > >             }
> > > > > 
> > > > 
> > > > You can make use of the properties of e to simplify calculating
> > > > the
> > > > inverse. The determinant is always v[0]<<14, so you can just do
> > > > if
> > > > (!v[0]) continue; and skip the determinant check altogether.
> > > > 
> > > 
> > > Even for real 2x2 matrix case? (Once one of rows is not 1, 0) ?
> > > May added such cases later.
> > 
> > You can just work the math out on paper. Inverse of
> > 
> >  1     0
> >  v[1]  v[0]
> > 
> > is
> > 
> >  1           0
> >  -v[1]/v[0]  1/v[0]
> > 
> > not accounting for shifts.
> > 
> 
> But I want to add real 2x2 matrix with no 0 cell, with:
> 
> a, b
> c, d
> 
> later. (even though gains are small, as encoded files use it rarely)

If this is possible within MLP then yes, do that. It is not clear from
what you've told me so far and from my brief reading of the code how
capable the format is.

> > Also RE: my other comments, you are right. I didn't take into
> > account
> > that MLP is lossless and that there may be off-by-one errors.
> > 
> > And as I said on IRC you can formulate this as a least squares
> > problem,
> > then solve it using a linear system solve. This patch seems finds a
> > solution that minimizes L1 rather than L2 though. Not sure what the
> > implications of that are compressionwise. What happens if you
> > replace
> > FFABS() with a square for scoring?
> > 
> 
> It reduces size usually by less then 0.002 %
> 
> Linear system solver gives vectors to create equations for both
> channels at
> same time?

L2 minimization allows using ordinary least squarse. As I said on IRC,
the rub lies in formulating the problem properly. Minimizing L1 is much
harder, since it involves solving a linear program. Of course for
practical purposes we don't need an exact solution.

Looking a bit more at the code, what is important is the decoding
coefficients, the d matrix. The encoder is free to choose d and the
encoded residuals so long as it decodes correctly. The decoder is
specified on d, not e.

Currently only one matrix is used (count=1 in estimate_coeff). With two
matrices something akin to a lifting scheme can be performed. This
means almost any 2x2 transform should be possible to perform (modulo
bitexactness concerns).

What I mean by lifting scheme here is that any 2x2 matrix A can be
decomposed into the product of two or more matrices on the form that e
has. I think.

We could potentially do something like alternating transforms on this
form:

l += k1*r;
r += k2*l;
l += k3*r;
r += k4*l;

This can always be inverted provided the intermediate results don't go
out of range, or in the event that they do go out of range, the decoder
is sufficiently well specified so that encoder and decoder don't go out
of sync. Compare how YCoCg-R is specified and fits in 3*8 bits. In fact
the WP article on YCoCg perhaps gets the point across better:
https://en.wikipedia.org/wiki/YCoCg
it in turn links this stackoverflow post which makes the same point:
https://stackoverflow.com/questions/10566668/lossless-rgb-to-ycbcr-transformation/12146329#12146329

I believe any transformed found by PCA can be converted into an
equivalent lifting scheme, and it will always be lossless provided
modulo is specified correctly in the codec. I have no idea if it is.

/Tomas