[FFmpeg-devel] A ppc patch to fix the fft bug in little endian environment for POWER

Wed Jun 11 14:55:09 CEST 2014

On Wed, Jun 11, 2014 at 05:58:25PM +0800, Grace Ryan wrote:
> Hi,
> 
> I ran the fate test on POWER7 little endian, found errors caused by fft ppc
> implementation for big endian. I hereby present this patch, which is to
> enable the calculation of the fft with GCC VSX intrincics so that the
> endian problem is handled automatically by GCC:
> 
> 1. Check the cpu flag, if it is POWER7/POWER8, add -mcpu=power7 or
> -mcpu=power8 in the configure file
> 2. If it is POWER7/POWER8, enable the macro HAVE_VSX in the configure file
> 3. Add fft_vsx.c and fft_vsx.h under the folder ./libavcodec/ppc, when
> HAVE_VSX is enabled, use this two files to calculate the fft either in big
> endian environment or little endian environment.
> 
> The fate test result can be found on http://fate.ffmpeg.org/ by search
> "ibmcrl", also attached here to facilitate the review:
> 
> The patch file is also attached. Thanks.
> 
> Regards,
> Rong Yan

>  configure                    |   32 +
>  libavcodec/ppc/Makefile      |    1 
>  libavcodec/ppc/fft_altivec.c |   12 
>  libavcodec/ppc/fft_vsx.c     |  229 +++++++++++
>  libavcodec/ppc/fft_vsx.h     |  830 +++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 1104 insertions(+)
> cb6802ef377e28a1e11cd8cbdf623b6f7ab60d9c  0001-ppc-fix-the-bug-of-fft-for-little-endian-Environment.patch
> From 2ab345ca4011aa067dd196cec3b52d87173030a2 Mon Sep 17 00:00:00 2001
> From: Rong Yan <rongyan236 at gmail.com>
> Date: Wed, 11 Jun 2014 04:46:46 -0400
> Subject: [PATCH 1/1] ppc: fix the bug of fft for little endian Environment on POWER7 and later
> 
> ---
>  configure                    |   32 ++
>  libavcodec/ppc/Makefile      |    1 +
>  libavcodec/ppc/fft_altivec.c |   12 +
>  libavcodec/ppc/fft_vsx.c     |  229 ++++++++++++
>  libavcodec/ppc/fft_vsx.h     |  830 ++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 1104 insertions(+), 0 deletions(-)
>  create mode 100644 libavcodec/ppc/fft_vsx.c
>  create mode 100644 libavcodec/ppc/fft_vsx.h
> 
> diff --git a/configure b/configure
> index fa66c4f..7672e99 100755
> --- a/configure
> +++ b/configure
> @@ -1547,6 +1547,7 @@ ARCH_EXT_LIST_PPC="
>      dcbzl
>      ldbrx
>      ppc4xx
> +    vsx
>  "
>  
>  ARCH_EXT_LIST_X86="
> @@ -1931,6 +1932,7 @@ mipsdspr2_deps="mips"
>  
>  altivec_deps="ppc"
>  ppc4xx_deps="ppc"
> +vsx_deps="ppc"
>  
>  cpunop_deps="i686"
>  x86_64_select="i686"

> @@ -2629,6 +2631,7 @@ else
>      arch_default=$(uname -m)
>  fi
>  cpu="generic"
> +ppccpu=$(cat /proc/cpuinfo | grep "cpu" | cut -f2 -d: | uniq -c | awk '{print $2}')
>  
>  # configurable options
>  enable $PROGRAM_LIST

this would fail for cross compilation, please find a different
solution. Best would be something similar to how other (non ppc)
cpus are handled

[...]

> diff --git a/libavcodec/ppc/fft_vsx.h b/libavcodec/ppc/fft_vsx.h
> new file mode 100644
> index 0000000..37d7cda
> --- /dev/null
> +++ b/libavcodec/ppc/fft_vsx.h
> @@ -0,0 +1,830 @@
> +#ifndef AVCODEC_PPC_FFT_VSX_H
> +#define AVCODEC_PPC_FFT_VSX_H
> +/*
> + * FFT  transform, optimized with VSX built-in functions
> + * Copyright (c) 2014 Rong Yan
> + *

Is this based on libavcodec/ppc/fft_altivec_s.S ?
If so please add the copyright statement from it in addition to the new
file.
Otherwise it should be fine as is

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140611/4ab0df01/attachment.asc>