[FFmpeg-devel] [PATCH] lavfi: add xbr filter

Tue Oct 28 23:06:53 CET 2014

On Tue, Oct 28, 2014 at 10:51:27PM +0100, Michael Niedermayer wrote:
> On Tue, Oct 28, 2014 at 07:16:45PM +0100, Clément Bœsch wrote:
> > On Tue, Oct 28, 2014 at 06:30:34PM +0100, Stefano Sabatini wrote:
> > [...]
> > > How much effort would it take to implement the remaining scaling modes?
> > > 
> > 
> > According to
> > https://ffmpeg.org/pipermail/ffmpeg-devel/2014-October/164574.html
> > 
> > "I think 4x can be done fast enough, but 3x will take time."
> > 
> > [...]
> > > > +typedef struct {
> > > > +    uint32_t rgbtoyuv[1<<24];
> > > 
> > > We should avoid this 64MiB. Also the table should be possibly static,
> > > so you don't have to fill it per each xBR instance.
> > > 
> > 
> > So, I requested to do it exactly the same as HQx because this part is
> > common according to the specifications. This should be kept the same
> > vf_hqx, and then factorized.
> > 
> 
> > Now about removing this allocation, I did benchmark this LUT vs
> > computation (see attached patch for comp. version). And the problem is
> > that it's slightly slower, probably due to the /1000.
> 
> why do you divide at all ?
> cant you do the computations with full precission ?

I wasn't able to... but I was probably doing it wrong.

And anyway, so far I observed this:
  lut:         127fps
  nolut+div:   119fps
  nolut+nodiv: 123fps

So even with "fast" computation, it's still slower than the LUT. It probably
doesn't matter that much in practice, and dropping that huge table might be
worth the performance impact (feel free to discuss).

Note that looking at the original code (which was working on rgb565 only),
it was bitexact. The rgb 24-bit was added in the "modern" hqx with float
point. So we can probably tolerate the inaccuracy. Still, if you find a
way of keeping full accuracy with the modern implementation...

Typically, I tried stuff like this:

  const uint32_t y = (uint32_t)((1225*r + 2404*g +  467*b + (1<<11)) >> 12);
  const uint32_t u = (uint32_t)((-692*r - 1356*g + 2048*b + (1<<11)) >> 12) + 128;
  const uint32_t v = (uint32_t)((2048*r - 1716*g -  332*b + (1<<11)) >> 12) + 128;

...but I'm probably doing it very wrong somewhere (sign issue maybe?), haven't
looked deeper. I went up to 15 bits, still didn't match, so I was probably
doing something stupid.

> also instead of doing 2 rgb2yuv and then taking their difference you
> can do the difference in rgb space and convert the rgb difference to
> a yuv difference
> its just aM - bM = (a-b)M

Ah, sounds like a good idea, I guess I'll try that.

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141028/11ed3ae8/attachment.asc>