[MPlayer-dev-eng] swscale question

Wed Oct 31 19:31:31 CET 2001

Hello, Michael!
On Tue, 30 Oct 2001 21:23:36 +0200, you wrote:

> Hi
> 
> On Tuesday 30 October 2001 19:14, Nick Kurshev wrote:
> > Hello, Michael!
> >
> > I've looking on your code and have some question for you:
> > 1. For what reason you've added "normal" asm optimization?
> >  #endif
> > 	//NO MMX just normal asm ...
> > 	asm volatile(
> > 		"xorl %%eax, %%eax		\n\t" // i
> > 		"xorl %%ebx, %%ebx		\n\t" // xx
> > 		"xorl %%ecx, %%ecx		\n\t" // 2*xalpha
> > 		"1:				\n\t"
> > 		"movzbl  (%0, %%ebx), %%edi	\n\t" //src[xx]
> > 		"movzbl 1(%0, %%ebx), %%esi	\n\t" //src[xx+1]
> > For what cpu it's optimized (pent, pent-mmx, ppro or k6, k7)?
> hmm, mine ;) ... (P3 at 500)
> it was written before the mmx2 code
Then - sorry!
> 
> > IMHO we should not ignore optimizing possibilities of gcc which
> > produces enough optimized code for targeted architectures.
> > (Even if you've win 1-2% on your cpu it doesn't mean that
> > we'll get the same speedup on every cpu).
> i fully agree with useing gcc if it outputs sane code, although a simple
> ./mplayer -vo x11 -pp 0 -zoom -xy 2 ~/ff.mpg  -benchmark shows:
> gcc: 16.532 sec
> asm: 12.467 sec
> looking at the output of gcc, it seems there is at least one partial register 
> stall (5 cycles loss on ppro,p2,p3)
> the c functions are not really optimized (this one does 2 multiplies 
> allthough 1 would be enough, ...)
> gcc neither used add/adc, ...
> 
> > From other side - togheter withh gcc exists other compilers which
> > can produce better code that gcc now, but I hope that in the
> > future gcc will be improved enough for that.
> >
> > 2. Althrough your code was enough well scheduled but first lines could be
> > scheduled better (in addition they are first thing which watch everyone) :
> i cant see a difference with -benchmark, did u try it?
> 
First: in this place difference can be very small to find out it without rdtsc.
Second: Your code has a lot such places (copies), isn't?
Third: As I see - you have P3 which can perform out-of-order execution of insn stream 
(same as PPro).
But what about Pent-MMX, which doesn't even know about such technologies?
Anyway - P3 can only perform out-of-order execution up to 3 insns per cpu clock.
That is too few, imho.
> ...
> >
> > Friendly! Nick
> 
> Michael
> _______________________________________________
> MPlayer-dev-eng mailing list
> MPlayer-dev-eng at mplayerhq.hu
> http://mplayerhq.hu/mailman/listinfo/mplayer-dev-eng
> 

Best regards! Nick