[MPlayer-dev-eng] [PATCH] vf_eq2 extensions

Fri Jan 31 22:42:31 CET 2003

On Fri, Jan 31, 2003 at 08:08:39PM +0100, Michael Niedermayer wrote:
> Hi
> 
> On Friday 31 January 2003 19:32, D Richard Felker III wrote:
> > On Fri, Jan 31, 2003 at 06:17:20PM +0100, Michael Niedermayer wrote:
> > > Hi
> > >
> > > On Friday 31 January 2003 17:20, D Richard Felker III wrote:
> > > [...]
> > >
> > > > > I am reluctant to put this stuff in vf_eq.c because the main
> > > > > function that I need (the main reason for vf_eq2 to exist) is
> > > > > gamma correction and that's the one thing that can't be done
> > > > > efficiently in vf_eq.c.
> > > >
> > > > Hmm, someone should add polynomial approx gamma correction to eq then
> > > > so we can just make eq2 obsolete.
> > >
> > > i doubt that evaluating a polynom is faster than a single L1 cache read
> > > from a 256 byte LUT
> >
> > Nope, it's not. But with MMX, evaluating 4 polynomials is just as fast
> > as evaluating one. :) And loading a single byte from memory, then
> > immediately using it as a 32bit offset into a lookup table, is VERY
> > SLOW.
> no
> 
> > x86 cpu's don't like mixing register sizes these days.
> yes, thats why there are very fast instructions to load 8&16 bit stuff with 
> zero or sign extension in a 32 bit register mov{z,s}{bl,wl}
> IIRC i once saw some very suboptimal code from gcc 2.95 with mixed short & int 
> code in the swscaler, just rewriting it in asm (no mmx) resulted in a 2x 
> speedup

They're quite slow too on original pentium and pentium mmx, and that
was the target I was optimizing for since everything beyond that was
10x faster than needed anyway. :) Also, though, the sign extended
loads might be a bit slow on k6...I'd have to check.

In any case, thanks for reminding me about those. I'd almost forgotten
they existed.

Rich