[FFmpeg-devel] [PATCH] Optimization of MP3 decoders for MIPS

Thu Jun 21 12:12:04 CEST 2012

Hello Vitor,

In this optimization we also tried to be bit exact with C for floating point (like in AMR). 
This optimization was written before I got your explanation that FP code for FFMPEG 
doesn't need to be bit exact.

You are correct that here we could use much more madds and msubs ( if bit exactness 
is disregarded ).
I will rewrite this and send new patch.

Thanks,
-Nedeljko
________________________________________
From: Vitor Sessak [vitor1001 at gmail.com]
Sent: Wednesday, June 20, 2012 1:35
To: FFmpeg development discussions and patches
Cc: Babic, Nedeljko; Lukac, Zeljko
Subject: Re: [FFmpeg-devel] [PATCH] Optimization of MP3 decoders for MIPS

On 06/11/2012 05:07 PM, Nedeljko Babic wrote:
> MP3 fixed and floating point decoders are optimized
>   for MIPS architecture.

I gave it a look and have just one comment:

> diff --git a/libavcodec/mips/mpegaudiodsp_mips_float.c b/libavcodec/mips/mpegaudiodsp_mips_float.c
> new file mode 100644
> index 0000000..d4a41af
> --- /dev/null
> +++ b/libavcodec/mips/mpegaudiodsp_mips_float.c
> @@ -0,0 +1,1307 @@
> +/*
> + * Copyright (c) 2012
> + *      MIPS Technologies, Inc., California.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the MIPS Technologies, Inc., nor the names of its
> + *    contributors may be used to endorse or promote products derived from
> + *    this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE MIPS TECHNOLOGIES, INC. ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL THE MIPS TECHNOLOGIES, INC. BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * Author:  Bojan Zivkovic (bojan at mips.com)
> + *
> + * MPEG Audio decoder optimized for MIPS floating-point architecture
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * Reference: libavcodec/mpegaudiodsp_template.c
> + *            libavcodec/dct32.c
> + */
> +
> +#include<string.h>
> +
> +#include "libavcodec/mpegaudiodsp.h"
> +

> +static void ff_dct32_mips_float(float *out, const float *tab)
> +{
> +    float val0 , val1 , val2 , val3 , val4 , val5 , val6 , val7,
> +          val8 , val9 , val10, val11, val12, val13, val14, val15,
> +          val16, val17, val18, val19, val20, val21, val22, val23,
> +          val24, val25, val26, val27, val28, val29, val30, val31;
> +
> +    /**
> +    * instructions are scheduled to minimize pipeline stall.
> +    */
> +    __asm__ __volatile__ (
> +        /* pass 1 */
> +        "lwc1   $f0,      0(%[tab])                          \n\t"
> +        "lwc1   $f1,      31*4(%[tab])                       \n\t"
> +        "lwc1   $f2,      15*4(%[tab])                       \n\t"
> +        "lwc1   $f3,      16*4(%[tab])                       \n\t"
> +        "lwc1   $f6,      7*4(%[tab])                        \n\t"
> +        "lwc1   $f7,      24*4(%[tab])                       \n\t"
> +        "lwc1   $f8,      8*4(%[tab])                        \n\t"
> +        "lwc1   $f9,      23*4(%[tab])                       \n\t"
> +        "li.s   $f4,      0.50060299823519630134             \n\t"
> +        "li.s   $f5,      10.19000812354805681150            \n\t"
> +        "add.s  %[val0],  $f0,      $f1                      \n\t"
> +        "add.s  %[val15], $f2,      $f3                      \n\t"
> +        "sub.s  $f0,      $f0,      $f1                      \n\t"
> +        "sub.s  $f1,      $f2,      $f3                      \n\t"
> +        "li.s   $f10,     0.67480834145500574602             \n\t"
> +        "li.s   $f11,     0.74453627100229844977             \n\t"
> +        "sub.s  $f2,      $f6,      $f7                      \n\t"
> +        "sub.s  $f3,      $f8,      $f9                      \n\t"
> +        "add.s  %[val7],  $f6,      $f7                      \n\t"
> +        "add.s  %[val8],  $f8,      $f9                      \n\t"
> +        "mul.s  %[val31], $f0,      $f4                      \n\t"
> +        "mul.s  %[val16], $f1,      $f5                      \n\t"
> +        "mul.s  %[val24], $f2,      $f10                     \n\t"
> +        "mul.s  %[val23], $f3,      $f11                     \n\t"
> +
> +        /* pass 2 */
> +        "li.s   $f4,      0.50241928618815570551             \n\t"
> +        "li.s   $f5,      -0.50241928618815570551            \n\t"
> +        "sub.s  $f0,      %[val0],  %[val15]                 \n\t"
> +        "sub.s  $f1,      %[val16], %[val31]                 \n\t"
> +        "add.s  %[val0],  %[val0],  %[val15]                 \n\t"
> +        "add.s  %[val16], %[val16], %[val31]                 \n\t"
> +        "li.s   $f10,     5.10114861868916385802             \n\t"
> +        "li.s   $f11,     -5.10114861868916385802            \n\t"
> +        "sub.s  $f2,      %[val7],  %[val8]                  \n\t"
> +        "sub.s  $f3,      %[val23], %[val24]                 \n\t"
> +        "mul.s  %[val15], $f0,      $f4                      \n\t"
> +        "mul.s  %[val31], $f1,      $f5                      \n\t"
> +        "add.s  %[val7],  %[val7],  %[val8]                  \n\t"
> +        "add.s  %[val23], %[val23], %[val24]                 \n\t"
> +        "mul.s  %[val8],  $f2,      $f10                     \n\t"
> +        "mul.s  %[val24], $f3,      $f11                     \n\t"
> +
> +        /* pass 3 */
> +        "li.s   $f4,      0.50979557910415916894             \n\t"
> +        "li.s   $f5,      -0.50979557910415916894            \n\t"
> +        "sub.s  $f0,      %[val0],  %[val7]                  \n\t"
> +        "sub.s  $f1,      %[val8],  %[val15]                 \n\t"
> +        "sub.s  $f2,      %[val16], %[val23]                 \n\t"
> +        "sub.s  $f3,      %[val24], %[val31]                 \n\t"
> +        "add.s  %[val0],  %[val0],  %[val7]                  \n\t"
> +        "add.s  %[val8],  %[val8],  %[val15]                 \n\t"
> +        "add.s  %[val16], %[val16], %[val23]                 \n\t"
> +        "add.s  %[val24], %[val24], %[val31]                 \n\t"
> +        "mul.s  %[val7],  $f0,      $f4                      \n\t"
> +        "mul.s  %[val15], $f1,      $f5                      \n\t"
> +        "mul.s  %[val23], $f2,      $f4                      \n\t"
> +        "mul.s  %[val31], $f3,      $f5                      \n\t"
> +
> +        : [val0]  "=f" (val0),  [val7]  "=f" (val7),
> +          [val8]  "=f" (val8),  [val15] "=f" (val15),
> +          [val16] "=f" (val16), [val23] "=f" (val23),
> +          [val24] "=f" (val24), [val31] "=f" (val31)
> +        : [tab]   "r"  (tab)
> +        : "$f0", "$f1", "$f2", "$f3", "$f4", "$f5",
> +          "$f6", "$f7", "$f8", "$f9", "$f10", "$f11"
> +    );

I think you can rewrite this function to use way more fused
multiply-adds/subs.

-Vitor