[Mplayer-users] [mplayer PATCH] fastmemcpy alignment for any cpu
Nick Kurshev
nickols_k at mail.ru
Mon Apr 23 11:48:41 CEST 2001
Hello, Arpi, Felix, all!
But what about alignment for any cpu?
It improves speed on any cpu which supports fasmemcpy.
Please compare (Duron-750):
Old results (misaligned dest):
mmx: v2-v1=766128661 = 1009719us (99.037fps) 148.6MB/s
k6 : v2-v1=627413495 = 826906us (120.933fps) 181.4MB/s
k7 : v2-v1=221021775 = 291269us (343.325fps) 515.0MB/s
New results (aligned dest):
mmx: v2-v1=766126582 = 1009433us (99.066fps) 148.6MB/s
k6 : v2-v1=563458894 = 742432us (134.692fps) 202.0MB/s
k7 : v2-v1=144598822 = 190540us (524.824fps) 787.2MB/s
Below patch:
--- fastmemcpy.h.old Sun Apr 22 23:25:47 2001
+++ fastmemcpy.h Mon Apr 23 09:29:57 2001
@@ -63,31 +63,33 @@
: "memory");\
}
+#ifdef HAVE_SSE
+#define MMREG_SIZE 16
+#else
+#define MMREG_SIZE 8
+#endif
+
inline static void * fast_memcpy(void * to, const void * from, unsigned len)
{
void *p;
int i;
-#ifdef HAVE_SSE /* Only P3 (may be Cyrix3) */
-// printf("fastmemcpy_pre(0x%X,0x%X,0x%X)\n",to,from,len);
- // Align dest to 16-byte boundary:
- if((unsigned long)to&15){
- int len2=16-((unsigned long)to&15);
- if(len>len2){
- len-=len2;
- __asm__ __volatile__(
- "rep ; movsb\n"
- :"=D" (to), "=S" (from)
- : "D" (to), "S" (from),"c" (len2)
- : "memory");
- }
- }
-// printf("fastmemcpy(0x%X,0x%X,0x%X)\n",to,from,len);
-#endif
-
if(len >= 0x200) /* 512-byte blocks */
{
+ register unsigned long int delta;
p = to;
+ /* Align destinition to MMREG_SIZE-boundary */
+ delta = ((unsigned long int)to)&(MMREG_SIZE-1);
+ if(delta)
+ {
+ delta=MMREG_SIZE-delta;
+ len -= delta;
+ __asm__ __volatile__(
+ "rep; movsb"
+ :"=D" (to), "=S" (from)
+ : "0" (to), "1" (from),"c" (delta)
+ : "memory");
+ }
i = len >> 6; /* len/64 */
len&=63;
Best regards! Nick
_______________________________________________
Mplayer-users mailing list
Mplayer-users at lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/mplayer-users
More information about the MPlayer-users
mailing list