[FFmpeg-devel] [PATCH] some SIMD write-combining for h264
Michael Niedermayer
michaelni
Sun Jan 17 21:52:08 CET 2010
On Sun, Jan 17, 2010 at 04:59:55AM -0500, Alexander Strange wrote:
>
> On Jan 16, 2010, at 9:59 PM, M?ns Rullg?rd wrote:
>
> > Alexander Strange <astrange at ithinksw.com> writes:
> >
> >> On Jan 16, 2010, at 12:35 AM, Michael Niedermayer wrote:
> >>
> >>> On Fri, Jan 15, 2010 at 11:11:23PM -0500, Alexander Strange wrote:
> >>>> This adds intreadwrite macros for 64/128-bit memory operations and uses them in h264.
> >>>>
> >>>> Unlike the other macros, these assume correct alignment, and the patch only defines the ones there was an immediate use for.
> >>>> This only has x86 versions, but others should be easy. The 64-bit operations can be done with double copies on most systems, I guess.
> >>>>
> >>>> Decoding a 30s file on Core 2 Merom with --cpu=core2 (minimum of 5 runs):
> >>>> x86-32: 12.72s before, 12.51s after (1.7%)
> >>>> x86-64: 10.24s before, 10.20s after (.4%)
> >>>>
> >>>> Tested on x86-32, x86-64, x86-32 with --arch=c.
> >>>
> >>> as your code uses MMX you need to at least mention EMMS/float issue in the
> >>> dox and probably a emms_c(); call before draw_horiz_band()
> >>> dunno if these are all
> >>
> >> Added in the comment.
> >>
> >>> also what sets __MMX__ ? we have our own defines for that
> >>
> >> It's a gcc builtin define, set based on ./configure --cpu=x adding
> >> -march. HAVE_MMX is for the build and not the host cpu family, and
> >> this is inlined asm, so it can't use it.
> >
> > Huh? Host... build???
>
> Oh, that was supposed to be "target"...
> Anyway, this is MMX being used like the cmov/clz inlines, so it depends on the given --cpu and not on the build system's capabilities.
do all compilers set __MMX__ ?
if one sets it wrong what will you do? we have no busines defining any __
[...]
>
> >> +#endif /* __SSE__ */
> >> +
> >> +#if __SSE2__
> >
> > Use #ifdef.
> >
> >> +#define AV_ZERO128 AV_ZERO128
> >> +static inline void AV_ZERO128(void *d)
> >> +{
> >> + typedef struct {uint64_t i[2];} v;
> >> +
> >> + __asm__("pxor %%xmm0, %%xmm0 \n\t"
> >> + "movdqa %%xmm0, %0 \n\t"
> >> + : "=m"(*(v*)d)
> >> + :: "xmm0");
> >> +}
> >
> >
> > Same as above about the typedef.
> >
> >> +#endif /* __SSE2__ */
> >> +
> >> +#endif /* AVUTIL_X86_INTREADWRITE_H */
> >> --
> >> 1.6.5.2
>
> intreadwrite.h | 47 +++++++++++++++++++++++++--
> x86/intreadwrite.h | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 137 insertions(+), 2 deletions(-)
> d2af02fbeb5e0527ca4e31270370651164fd8c8b 0001-Add-macros-for-write-combining-optimization.patch
> From 5a824eba882e0518d3892eb39d089f262b21c6b4 Mon Sep 17 00:00:00 2001
> From: Alexander Strange <astrange at ithinksw.com>
> Date: Fri, 15 Jan 2010 01:47:47 -0500
> Subject: [PATCH 1/2] Add macros for write-combining optimization.
>
> ---
> libavutil/intreadwrite.h | 47 ++++++++++++++++++++-
> libavutil/x86/intreadwrite.h | 92 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 137 insertions(+), 2 deletions(-)
> create mode 100644 libavutil/x86/intreadwrite.h
>
> diff --git a/libavutil/intreadwrite.h b/libavutil/intreadwrite.h
> index 933732c..15481aa 100644
> --- a/libavutil/intreadwrite.h
> +++ b/libavutil/intreadwrite.h
> @@ -25,8 +25,9 @@
>
> /*
> * Arch-specific headers can provide any combination of
> - * AV_[RW][BLN](16|24|32|64) macros. Preprocessor symbols must be
> - * defined, even if these are implemented as inline functions.
> + * AV_[RW][BLN](16|24|32|64) and AV_(COPY|SWAP|ZERO)(64|128) macros.
> + * Preprocessor symbols must be defined, even if these are implemented
> + * as inline functions.
> */
>
> #if ARCH_ARM
> @@ -37,6 +38,8 @@
> # include "mips/intreadwrite.h"
> #elif ARCH_PPC
> # include "ppc/intreadwrite.h"
> +#elif ARCH_X86
> +# include "x86/intreadwrite.h"
> #endif
>
> /*
> @@ -397,4 +400,44 @@ struct unaligned_16 { uint16_t l; } __attribute__((packed));
> } while(0)
> #endif
>
> +/* Parameters for AV_COPY*, AV_SWAP*, AV_ZERO* must be
> + * naturally aligned. They may be implemented using MMX,
> + * so emms_c() must be called before using any float code
> + * afterwards.
> + */
> +
> +#define AV_COPY(n, d, s) (*(uint##n##_t*)(d) = *(const uint##n##_t*)(s))
> +
> +#ifndef AV_COPY64
> +# define AV_COPY64(d, s) AV_COPY(64, d, s)
> +#endif
> +
> +#ifndef AV_COPY128
> +# define AV_COPY128(d, s) \
> + do { \
> + AV_COPY64(d, s); \
> + AV_COPY64((char*)(d)+8, (char*)(s)+8); \
> + } while(0)
> +#endif
> +
> +#define AV_SWAP(n, a, b) FFSWAP(uint##n##_t, *(uint##n##_t*)(a), *(uint##n##_t*)(b))
> +
> +#ifndef AV_SWAP64
> +# define AV_SWAP64(a, b) AV_SWAP(64, a, b)
> +#endif
> +
> +#define AV_ZERO(n, d) (*(uint##n##_t*)(d) = 0)
> +
> +#ifndef AV_ZERO64
> +# define AV_ZERO64(d) AV_ZERO(64, d)
> +#endif
> +
> +#ifndef AV_ZERO128
> +# define AV_ZERO128(d) \
> + do { \
> + AV_ZERO64(d); \
> + AV_ZERO64((char*)(d)+8); \
> + } while(0)
> +#endif
> +
> #endif /* AVUTIL_INTREADWRITE_H */
> diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h
> new file mode 100644
> index 0000000..296ecec
> --- /dev/null
> +++ b/libavutil/x86/intreadwrite.h
> @@ -0,0 +1,92 @@
> +/*
> + * Copyright (c) 2010 Alexander Strange <astrange at ithinksw.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef AVUTIL_X86_INTREADWRITE_H
> +#define AVUTIL_X86_INTREADWRITE_H
> +
> +#include <stdint.h>
> +#include "config.h"
> +
> +#if !HAVE_FAST_64BIT && defined(__MMX__)
> +
> +#define AV_COPY64 AV_COPY64
> +static inline void AV_COPY64(void *d, const void *s)
are you sure this should not be always_inline?
2 plain 32bit reads +writes likely beat pushing 2 pointers on the stack and
calling pulling 2 pointers off 2 mmx for copy and return
besides these things your patch looks pretty good
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100117/0000593f/attachment.pgp>
More information about the ffmpeg-devel
mailing list