[FFmpeg-devel] A ppc patch to fix the fft bug in little endian environment for POWER
Michael Niedermayer
michaelni at gmx.at
Wed Jun 11 14:55:09 CEST 2014
On Wed, Jun 11, 2014 at 05:58:25PM +0800, Grace Ryan wrote:
> Hi,
>
> I ran the fate test on POWER7 little endian, found errors caused by fft ppc
> implementation for big endian. I hereby present this patch, which is to
> enable the calculation of the fft with GCC VSX intrincics so that the
> endian problem is handled automatically by GCC:
>
> 1. Check the cpu flag, if it is POWER7/POWER8, add -mcpu=power7 or
> -mcpu=power8 in the configure file
> 2. If it is POWER7/POWER8, enable the macro HAVE_VSX in the configure file
> 3. Add fft_vsx.c and fft_vsx.h under the folder ./libavcodec/ppc, when
> HAVE_VSX is enabled, use this two files to calculate the fft either in big
> endian environment or little endian environment.
>
> The fate test result can be found on http://fate.ffmpeg.org/ by search
> "ibmcrl", also attached here to facilitate the review:
>
> The patch file is also attached. Thanks.
>
> Regards,
> Rong Yan
> configure | 32 +
> libavcodec/ppc/Makefile | 1
> libavcodec/ppc/fft_altivec.c | 12
> libavcodec/ppc/fft_vsx.c | 229 +++++++++++
> libavcodec/ppc/fft_vsx.h | 830 +++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 1104 insertions(+)
> cb6802ef377e28a1e11cd8cbdf623b6f7ab60d9c 0001-ppc-fix-the-bug-of-fft-for-little-endian-Environment.patch
> From 2ab345ca4011aa067dd196cec3b52d87173030a2 Mon Sep 17 00:00:00 2001
> From: Rong Yan <rongyan236 at gmail.com>
> Date: Wed, 11 Jun 2014 04:46:46 -0400
> Subject: [PATCH 1/1] ppc: fix the bug of fft for little endian Environment on POWER7 and later
>
> ---
> configure | 32 ++
> libavcodec/ppc/Makefile | 1 +
> libavcodec/ppc/fft_altivec.c | 12 +
> libavcodec/ppc/fft_vsx.c | 229 ++++++++++++
> libavcodec/ppc/fft_vsx.h | 830 ++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 1104 insertions(+), 0 deletions(-)
> create mode 100644 libavcodec/ppc/fft_vsx.c
> create mode 100644 libavcodec/ppc/fft_vsx.h
>
> diff --git a/configure b/configure
> index fa66c4f..7672e99 100755
> --- a/configure
> +++ b/configure
> @@ -1547,6 +1547,7 @@ ARCH_EXT_LIST_PPC="
> dcbzl
> ldbrx
> ppc4xx
> + vsx
> "
>
> ARCH_EXT_LIST_X86="
> @@ -1931,6 +1932,7 @@ mipsdspr2_deps="mips"
>
> altivec_deps="ppc"
> ppc4xx_deps="ppc"
> +vsx_deps="ppc"
>
> cpunop_deps="i686"
> x86_64_select="i686"
> @@ -2629,6 +2631,7 @@ else
> arch_default=$(uname -m)
> fi
> cpu="generic"
> +ppccpu=$(cat /proc/cpuinfo | grep "cpu" | cut -f2 -d: | uniq -c | awk '{print $2}')
>
> # configurable options
> enable $PROGRAM_LIST
this would fail for cross compilation, please find a different
solution. Best would be something similar to how other (non ppc)
cpus are handled
[...]
> diff --git a/libavcodec/ppc/fft_vsx.h b/libavcodec/ppc/fft_vsx.h
> new file mode 100644
> index 0000000..37d7cda
> --- /dev/null
> +++ b/libavcodec/ppc/fft_vsx.h
> @@ -0,0 +1,830 @@
> +#ifndef AVCODEC_PPC_FFT_VSX_H
> +#define AVCODEC_PPC_FFT_VSX_H
> +/*
> + * FFT transform, optimized with VSX built-in functions
> + * Copyright (c) 2014 Rong Yan
> + *
Is this based on libavcodec/ppc/fft_altivec_s.S ?
If so please add the copyright statement from it in addition to the new
file.
Otherwise it should be fine as is
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140611/4ab0df01/attachment.asc>
More information about the ffmpeg-devel
mailing list