[MPlayer-dev-eng] swscale question
Nick Kurshev
nickols_k at mail.ru
Wed Oct 31 19:31:31 CET 2001
Hello, Michael!
On Tue, 30 Oct 2001 21:23:36 +0200, you wrote:
> Hi
>
> On Tuesday 30 October 2001 19:14, Nick Kurshev wrote:
> > Hello, Michael!
> >
> > I've looking on your code and have some question for you:
> > 1. For what reason you've added "normal" asm optimization?
> > #endif
> > //NO MMX just normal asm ...
> > asm volatile(
> > "xorl %%eax, %%eax \n\t" // i
> > "xorl %%ebx, %%ebx \n\t" // xx
> > "xorl %%ecx, %%ecx \n\t" // 2*xalpha
> > "1: \n\t"
> > "movzbl (%0, %%ebx), %%edi \n\t" //src[xx]
> > "movzbl 1(%0, %%ebx), %%esi \n\t" //src[xx+1]
> > For what cpu it's optimized (pent, pent-mmx, ppro or k6, k7)?
> hmm, mine ;) ... (P3 at 500)
> it was written before the mmx2 code
Then - sorry!
>
> > IMHO we should not ignore optimizing possibilities of gcc which
> > produces enough optimized code for targeted architectures.
> > (Even if you've win 1-2% on your cpu it doesn't mean that
> > we'll get the same speedup on every cpu).
> i fully agree with useing gcc if it outputs sane code, although a simple
> ./mplayer -vo x11 -pp 0 -zoom -xy 2 ~/ff.mpg -benchmark shows:
> gcc: 16.532 sec
> asm: 12.467 sec
> looking at the output of gcc, it seems there is at least one partial register
> stall (5 cycles loss on ppro,p2,p3)
> the c functions are not really optimized (this one does 2 multiplies
> allthough 1 would be enough, ...)
> gcc neither used add/adc, ...
>
> > From other side - togheter withh gcc exists other compilers which
> > can produce better code that gcc now, but I hope that in the
> > future gcc will be improved enough for that.
> >
> > 2. Althrough your code was enough well scheduled but first lines could be
> > scheduled better (in addition they are first thing which watch everyone) :
> i cant see a difference with -benchmark, did u try it?
>
First: in this place difference can be very small to find out it without rdtsc.
Second: Your code has a lot such places (copies), isn't?
Third: As I see - you have P3 which can perform out-of-order execution of insn stream
(same as PPro).
But what about Pent-MMX, which doesn't even know about such technologies?
Anyway - P3 can only perform out-of-order execution up to 3 insns per cpu clock.
That is too few, imho.
> ...
> >
> > Friendly! Nick
>
> Michael
> _______________________________________________
> MPlayer-dev-eng mailing list
> MPlayer-dev-eng at mplayerhq.hu
> http://mplayerhq.hu/mailman/listinfo/mplayer-dev-eng
>
Best regards! Nick
More information about the MPlayer-dev-eng
mailing list