[Mplayer-users] [mplayer PATCH] SSE fastmemcpy improvements
Felix Bünemann
atmosfear at users.sourceforge.net
Fri Apr 20 18:30:00 CEST 2001
Am Freitag, 20. April 2001 19:14 schrieben Sie:
> > Hello, Arpi!
>
> I want to suggest you a patch which improves SSE related part of
> fastmemcpy. After studing Intel manuals I don't understand one thing:
> Can we suppose that data is temporary aligned after executing PREFETCH
> instruction or not? IMHO, probably - yes.
>
> May be you'll test it on your CeleronII?
>
> For it please remove block with MOVUPS insns and try to compile my test
> program which was sent by me yestoday. If GPF is not occured then, compare
> benchmarks of MOVUPS and MOVAPS blocks separately.
>
commented out line 102 to line 120:
#if 0
if((unsigned long)from) & 15)
[...]
else
#endif
hope that's correct, when not doing so mplayer won't compile.
But with new code x11, xv and sdl out crash, dga works but doesn't use
fastmemcpy.h.
Here is a backtrace from x11 out:
Start playing...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1024 (LWP 6647)]
0x80a3c28 in draw_frame (src=0x81d8d78) at fastmemcpy.h:130
130 __asm__ __volatile__ (
(gdb) bt
#0 0x80a3c28 in draw_frame (src=0x81d8d78) at fastmemcpy.h:130
#1 0x8059a79 in main (argc=4, argv=0xbffff684, envp=0xbffff698) at
mplayer.c:1485
#2 0x402febaf in __libc_start_main () from /lib/libc.so.6
Test results (each 10 runs after compile):
System is PIII 750 Coppermine, 124MHz FSB so CPU is at 930MHz and 512MB PC133
SD-RAM (timing 3-3-3) with linux 2.2.18 with fxsr and xmm support (regarding
to /proc/cpuinfo), test command issued was:
perl -e 'for($i = 0; $i < 11; $i++) { system "./fastmembench"; }'
normal:
v1 = 28231618005329 v2 = 28231618285403 v2-v1=280074
v1 = 28231620286810 v2 = 28231620561590 v2-v1=274780
v1 = 28231622923672 v2 = 28231623201468 v2-v1=277796
v1 = 28231625232289 v2 = 28231625506627 v2-v1=274338
v1 = 28231627508402 v2 = 28231627783184 v2-v1=274782
v1 = 28231629791257 v2 = 28231630064900 v2-v1=273643
v1 = 28231632227140 v2 = 28231632504422 v2-v1=277282
v1 = 28231634507802 v2 = 28231634783418 v2-v1=275616
v1 = 28231636808167 v2 = 28231637081115 v2-v1=272948
v1 = 28231639091266 v2 = 28231639366823 v2-v1=275557
v1 = 28231641505781 v2 = 28231641783581 v2-v1=277800
mmx:
v1 = 28367967036396 v2 = 28367967316508 v2-v1=280112
v1 = 28367969777309 v2 = 28367970054707 v2-v1=277398
v1 = 28367972319284 v2 = 28367972594535 v2-v1=275251
v1 = 28367974853481 v2 = 28367975129946 v2-v1=276465
v1 = 28367977373594 v2 = 28367977649760 v2-v1=276166
v1 = 28367980343609 v2 = 28367980622806 v2-v1=279197
v1 = 28367982862319 v2 = 28367983137570 v2-v1=275251
v1 = 28367985392489 v2 = 28367985689934 v2-v1=297445
v1 = 28367988061199 v2 = 28367988337788 v2-v1=276589
v1 = 28367990574929 v2 = 28367990853039 v2-v1=278110
v1 = 28367993107491 v2 = 28367993382251 v2-v1=274760
old sse:
v1 = 28510748533056 v2 = 28510748820014 v2-v1=286958
v1 = 28510751078931 v2 = 28510751355465 v2-v1=276534
v1 = 28510753567671 v2 = 28510753842686 v2-v1=275015
v1 = 28510756328174 v2 = 28510756606844 v2-v1=278670
v1 = 28510758821204 v2 = 28510759095764 v2-v1=274560
v1 = 28510761318929 v2 = 28510761596703 v2-v1=277774
v1 = 28510763844966 v2 = 28510764119725 v2-v1=274759
v1 = 28510766505719 v2 = 28510766781884 v2-v1=276165
v1 = 28510769000804 v2 = 28510769273793 v2-v1=272989
v1 = 28510771496654 v2 = 28510771769578 v2-v1=272924
v1 = 28510774026831 v2 = 28510774301281 v2-v1=274450
new sse:
v1 = 28416198780569 v2 = 28416199068093 v2-v1=287524
v1 = 28416201342314 v2 = 28416201618301 v2-v1=275987
v1 = 28416203858106 v2 = 28416204131927 v2-v1=273821
v1 = 28416206397846 v2 = 28416206924789 v2-v1=526943
v1 = 28416209167094 v2 = 28416209443474 v2-v1=276380
v1 = 28416211684664 v2 = 28416211960693 v2-v1=276029
v1 = 28416214205969 v2 = 28416214482591 v2-v1=276622
v1 = 28416217151691 v2 = 28416217433731 v2-v1=282040
v1 = 28416219677669 v2 = 28416219953554 v2-v1=275885
v1 = 28416222207449 v2 = 28416222485195 v2-v1=277746
v1 = 28416224717436 v2 = 28416224992738 v2-v1=275302
> Ayway, this patch probably fully workable, but it required to be tested on
> your CeleronII. In attach is full code of fatsmemcpy.h.
>
> Best regards! Nick
--
Best Regards,
Felix
_______________________________________________
Mplayer-users mailing list
Mplayer-users at lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/mplayer-users
More information about the MPlayer-users
mailing list