[FFmpeg-devel] [PATCH] some SIMD write-combining for h264

Sun Jan 17 21:52:08 CET 2010

On Sun, Jan 17, 2010 at 04:59:55AM -0500, Alexander Strange wrote:
> 
> On Jan 16, 2010, at 9:59 PM, M?ns Rullg?rd wrote:
> 
> > Alexander Strange <astrange at ithinksw.com> writes:
> > 
> >> On Jan 16, 2010, at 12:35 AM, Michael Niedermayer wrote:
> >> 
> >>> On Fri, Jan 15, 2010 at 11:11:23PM -0500, Alexander Strange wrote:
> >>>> This adds intreadwrite macros for 64/128-bit memory operations and uses them in h264.
> >>>> 
> >>>> Unlike the other macros, these assume correct alignment, and the patch only defines the ones there was an immediate use for.
> >>>> This only has x86 versions, but others should be easy. The 64-bit operations can be done with double copies on most systems, I guess.
> >>>> 
> >>>> Decoding a 30s file on Core 2 Merom with --cpu=core2 (minimum of 5 runs):
> >>>> x86-32: 12.72s before, 12.51s after (1.7%)
> >>>> x86-64: 10.24s before, 10.20s after (.4%)
> >>>> 
> >>>> Tested on x86-32, x86-64, x86-32 with --arch=c.
> >>> 
> >>> as your code uses MMX you need to at least mention EMMS/float issue in the
> >>> dox and probably a  emms_c(); call before draw_horiz_band()
> >>> dunno if these are all
> >> 
> >> Added in the comment.
> >> 
> >>> also what sets __MMX__ ? we have our own defines for that
> >> 
> >> It's a gcc builtin define, set based on ./configure --cpu=x adding
> >> -march.  HAVE_MMX is for the build and not the host cpu family, and
> >> this is inlined asm, so it can't use it.
> > 
> > Huh?  Host... build???
> 
> Oh, that was supposed to be "target"...
> Anyway, this is MMX being used like the cmov/clz inlines, so it depends on the given --cpu and not on the build system's capabilities.

do all compilers set __MMX__ ?
if one sets it wrong what will you do? we have no busines defining any __ 

[...]
> 
> >> +#endif /* __SSE__ */
> >> +
> >> +#if __SSE2__
> > 
> > Use #ifdef.
> > 
> >> +#define AV_ZERO128 AV_ZERO128
> >> +static inline void AV_ZERO128(void *d)
> >> +{
> >> +    typedef struct {uint64_t i[2];} v;
> >> +
> >> +    __asm__("pxor %%xmm0, %%xmm0  \n\t"
> >> +            "movdqa   %%xmm0, %0  \n\t"
> >> +            : "=m"(*(v*)d)
> >> +            :: "xmm0");
> >> +}
> > 
> > 
> > Same as above about the typedef.
> > 
> >> +#endif /* __SSE2__ */
> >> +
> >> +#endif /* AVUTIL_X86_INTREADWRITE_H */
> >> -- 
> >> 1.6.5.2
> 

>  intreadwrite.h     |   47 +++++++++++++++++++++++++--
>  x86/intreadwrite.h |   92 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 137 insertions(+), 2 deletions(-)
> d2af02fbeb5e0527ca4e31270370651164fd8c8b  0001-Add-macros-for-write-combining-optimization.patch
> From 5a824eba882e0518d3892eb39d089f262b21c6b4 Mon Sep 17 00:00:00 2001
> From: Alexander Strange <astrange at ithinksw.com>
> Date: Fri, 15 Jan 2010 01:47:47 -0500
> Subject: [PATCH 1/2] Add macros for write-combining optimization.
> 
> ---
>  libavutil/intreadwrite.h     |   47 ++++++++++++++++++++-
>  libavutil/x86/intreadwrite.h |   92 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 137 insertions(+), 2 deletions(-)
>  create mode 100644 libavutil/x86/intreadwrite.h
> 
> diff --git a/libavutil/intreadwrite.h b/libavutil/intreadwrite.h
> index 933732c..15481aa 100644
> --- a/libavutil/intreadwrite.h
> +++ b/libavutil/intreadwrite.h
> @@ -25,8 +25,9 @@
>  
>  /*
>   * Arch-specific headers can provide any combination of
> - * AV_[RW][BLN](16|24|32|64) macros.  Preprocessor symbols must be
> - * defined, even if these are implemented as inline functions.
> + * AV_[RW][BLN](16|24|32|64) and AV_(COPY|SWAP|ZERO)(64|128) macros.
> + * Preprocessor symbols must be defined, even if these are implemented
> + * as inline functions.
>   */
>  
>  #if   ARCH_ARM
> @@ -37,6 +38,8 @@
>  #   include "mips/intreadwrite.h"
>  #elif ARCH_PPC
>  #   include "ppc/intreadwrite.h"
> +#elif ARCH_X86
> +#   include "x86/intreadwrite.h"
>  #endif
>  
>  /*
> @@ -397,4 +400,44 @@ struct unaligned_16 { uint16_t l; } __attribute__((packed));
>      } while(0)
>  #endif
>  
> +/* Parameters for AV_COPY*, AV_SWAP*, AV_ZERO* must be
> + * naturally aligned. They may be implemented using MMX,
> + * so emms_c() must be called before using any float code
> + * afterwards.
> + */
> +
> +#define AV_COPY(n, d, s) (*(uint##n##_t*)(d) = *(const uint##n##_t*)(s))
> +
> +#ifndef AV_COPY64
> +#   define AV_COPY64(d, s) AV_COPY(64, d, s)
> +#endif
> +
> +#ifndef AV_COPY128
> +#   define AV_COPY128(d, s)                    \
> +    do {                                       \
> +        AV_COPY64(d, s);                       \
> +        AV_COPY64((char*)(d)+8, (char*)(s)+8); \
> +    } while(0)
> +#endif
> +
> +#define AV_SWAP(n, a, b) FFSWAP(uint##n##_t, *(uint##n##_t*)(a), *(uint##n##_t*)(b))
> +
> +#ifndef AV_SWAP64
> +#   define AV_SWAP64(a, b) AV_SWAP(64, a, b)
> +#endif
> +
> +#define AV_ZERO(n, d) (*(uint##n##_t*)(d) = 0)
> +
> +#ifndef AV_ZERO64
> +#   define AV_ZERO64(d) AV_ZERO(64, d)
> +#endif
> +
> +#ifndef AV_ZERO128
> +#   define AV_ZERO128(d)         \
> +    do {                         \
> +        AV_ZERO64(d);            \
> +        AV_ZERO64((char*)(d)+8); \
> +    } while(0)
> +#endif
> +
>  #endif /* AVUTIL_INTREADWRITE_H */
> diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h
> new file mode 100644
> index 0000000..296ecec
> --- /dev/null
> +++ b/libavutil/x86/intreadwrite.h
> @@ -0,0 +1,92 @@
> +/*
> + * Copyright (c) 2010 Alexander Strange <astrange at ithinksw.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef AVUTIL_X86_INTREADWRITE_H
> +#define AVUTIL_X86_INTREADWRITE_H
> +
> +#include <stdint.h>
> +#include "config.h"
> +
> +#if !HAVE_FAST_64BIT && defined(__MMX__)
> +

> +#define AV_COPY64 AV_COPY64
> +static inline void AV_COPY64(void *d, const void *s)

are you sure this should not be always_inline?
2 plain 32bit reads +writes likely beat pushing 2 pointers on the stack and
calling pulling 2 pointers off 2 mmx for copy and return 

besides these things your patch looks pretty good

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100117/0000593f/attachment.pgp>