[Mplayer-users] Question: Fastmemcpy and 3dnow

Nick Kurshev nickols_k at mail.ru
Wed Apr 18 22:36:11 CEST 2001


Hello!

I have found that Pontscho/fresh!mindworkz has corrected fastmemcpy code. I don't know what timing has 
k6-2 with such version but on k7 (and undoubdetly on P3) it version of small_memcpy is slower, because 
mplayer uses misaligned data such as 2-byte aligned (I already wrote it).
Modern processors are very sensitivity to misaligned memory access.
P3 manual says:
On a P6 family processor, a misaligned access that crosses a cache line boundary costs 6 to 9 clocks.
On a P6 family processor, unaligned accesses that cause a data cache split stall the processor. A data 
cache split is a memory access that crosses a 32-byte cache line boundary.
(Unfortunately it's P3 manual)

K7 manual says:
Avoid misaligned data references. A misaligned store or load operation suffers a minimum 1-cycle penalty in
the AMD Athlon processor load/store misaligned access that crosses a pipeline.

On my Duron it's significant! Only way to avoid such situations it's using movntXX instructions or single movb.

I want to suggest him correct it again by using following:

#undef HAVE_3DNOW_K6
#if defined( HAVE_3DNOW ) && !defined( HAVE_MMX2 )
#define HAVE_3DNOW_K6
#endif

after it correct code:

-#if 0
+#ifndef HAVE_3DNOW_K6
	small_memcpy(to, from, len);
#else
        __asm__ __volatile__ (
                "shrl $1,%%ecx\n"
                "jnc 1f\n"
                "movsb\n"
                "1:\n"
                "shrl $1,%%ecx\n"
                "jnc 2f\n"
                "movsw\n"
                "2:\n"
                "rep ; movsl\n"
        	::"D" (to), "S" (from),"c" (len)
        	: "memory");
If SOURCE and DEST are 2-byte aligned then (in general) you never will be able to align them on 4-byte 
boundary and MOVSL will work with misaligned data anyway.
Please study fragment of code from vo_sdl.c:

static uint32_t draw_frame(uint8_t *src[])
{
	struct sdl_priv_s *priv = &sdl_priv;
	uint8_t *dst;
...
        switch(priv->format){
        case IMGFMT_YV12:
        case IMGFMT_I420:
        case IMGFMT_IYUV:
...
	memcpy (dst, src[2], priv->framePlaneUV);
	dst += priv->framePlaneUV;
	memcpy (dst, src[1], priv->framePlaneUV);


Best regards! Nick



_______________________________________________
Mplayer-users mailing list
Mplayer-users at lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/mplayer-users



More information about the MPlayer-users mailing list