[FFmpeg-devel] [PATCH] Optimization of MP3 decoders for MIPS
Babic, Nedeljko
nbabic at mips.com
Thu Jun 21 12:12:04 CEST 2012
Hello Vitor,
In this optimization we also tried to be bit exact with C for floating point (like in AMR).
This optimization was written before I got your explanation that FP code for FFMPEG
doesn't need to be bit exact.
You are correct that here we could use much more madds and msubs ( if bit exactness
is disregarded ).
I will rewrite this and send new patch.
Thanks,
-Nedeljko
________________________________________
From: Vitor Sessak [vitor1001 at gmail.com]
Sent: Wednesday, June 20, 2012 1:35
To: FFmpeg development discussions and patches
Cc: Babic, Nedeljko; Lukac, Zeljko
Subject: Re: [FFmpeg-devel] [PATCH] Optimization of MP3 decoders for MIPS
On 06/11/2012 05:07 PM, Nedeljko Babic wrote:
> MP3 fixed and floating point decoders are optimized
> for MIPS architecture.
I gave it a look and have just one comment:
> diff --git a/libavcodec/mips/mpegaudiodsp_mips_float.c b/libavcodec/mips/mpegaudiodsp_mips_float.c
> new file mode 100644
> index 0000000..d4a41af
> --- /dev/null
> +++ b/libavcodec/mips/mpegaudiodsp_mips_float.c
> @@ -0,0 +1,1307 @@
> +/*
> + * Copyright (c) 2012
> + * MIPS Technologies, Inc., California.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the MIPS Technologies, Inc., nor the names of its
> + * contributors may be used to endorse or promote products derived from
> + * this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE MIPS TECHNOLOGIES, INC. ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE MIPS TECHNOLOGIES, INC. BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * Author: Bojan Zivkovic (bojan at mips.com)
> + *
> + * MPEG Audio decoder optimized for MIPS floating-point architecture
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * Reference: libavcodec/mpegaudiodsp_template.c
> + * libavcodec/dct32.c
> + */
> +
> +#include<string.h>
> +
> +#include "libavcodec/mpegaudiodsp.h"
> +
> +static void ff_dct32_mips_float(float *out, const float *tab)
> +{
> + float val0 , val1 , val2 , val3 , val4 , val5 , val6 , val7,
> + val8 , val9 , val10, val11, val12, val13, val14, val15,
> + val16, val17, val18, val19, val20, val21, val22, val23,
> + val24, val25, val26, val27, val28, val29, val30, val31;
> +
> + /**
> + * instructions are scheduled to minimize pipeline stall.
> + */
> + __asm__ __volatile__ (
> + /* pass 1 */
> + "lwc1 $f0, 0(%[tab]) \n\t"
> + "lwc1 $f1, 31*4(%[tab]) \n\t"
> + "lwc1 $f2, 15*4(%[tab]) \n\t"
> + "lwc1 $f3, 16*4(%[tab]) \n\t"
> + "lwc1 $f6, 7*4(%[tab]) \n\t"
> + "lwc1 $f7, 24*4(%[tab]) \n\t"
> + "lwc1 $f8, 8*4(%[tab]) \n\t"
> + "lwc1 $f9, 23*4(%[tab]) \n\t"
> + "li.s $f4, 0.50060299823519630134 \n\t"
> + "li.s $f5, 10.19000812354805681150 \n\t"
> + "add.s %[val0], $f0, $f1 \n\t"
> + "add.s %[val15], $f2, $f3 \n\t"
> + "sub.s $f0, $f0, $f1 \n\t"
> + "sub.s $f1, $f2, $f3 \n\t"
> + "li.s $f10, 0.67480834145500574602 \n\t"
> + "li.s $f11, 0.74453627100229844977 \n\t"
> + "sub.s $f2, $f6, $f7 \n\t"
> + "sub.s $f3, $f8, $f9 \n\t"
> + "add.s %[val7], $f6, $f7 \n\t"
> + "add.s %[val8], $f8, $f9 \n\t"
> + "mul.s %[val31], $f0, $f4 \n\t"
> + "mul.s %[val16], $f1, $f5 \n\t"
> + "mul.s %[val24], $f2, $f10 \n\t"
> + "mul.s %[val23], $f3, $f11 \n\t"
> +
> + /* pass 2 */
> + "li.s $f4, 0.50241928618815570551 \n\t"
> + "li.s $f5, -0.50241928618815570551 \n\t"
> + "sub.s $f0, %[val0], %[val15] \n\t"
> + "sub.s $f1, %[val16], %[val31] \n\t"
> + "add.s %[val0], %[val0], %[val15] \n\t"
> + "add.s %[val16], %[val16], %[val31] \n\t"
> + "li.s $f10, 5.10114861868916385802 \n\t"
> + "li.s $f11, -5.10114861868916385802 \n\t"
> + "sub.s $f2, %[val7], %[val8] \n\t"
> + "sub.s $f3, %[val23], %[val24] \n\t"
> + "mul.s %[val15], $f0, $f4 \n\t"
> + "mul.s %[val31], $f1, $f5 \n\t"
> + "add.s %[val7], %[val7], %[val8] \n\t"
> + "add.s %[val23], %[val23], %[val24] \n\t"
> + "mul.s %[val8], $f2, $f10 \n\t"
> + "mul.s %[val24], $f3, $f11 \n\t"
> +
> + /* pass 3 */
> + "li.s $f4, 0.50979557910415916894 \n\t"
> + "li.s $f5, -0.50979557910415916894 \n\t"
> + "sub.s $f0, %[val0], %[val7] \n\t"
> + "sub.s $f1, %[val8], %[val15] \n\t"
> + "sub.s $f2, %[val16], %[val23] \n\t"
> + "sub.s $f3, %[val24], %[val31] \n\t"
> + "add.s %[val0], %[val0], %[val7] \n\t"
> + "add.s %[val8], %[val8], %[val15] \n\t"
> + "add.s %[val16], %[val16], %[val23] \n\t"
> + "add.s %[val24], %[val24], %[val31] \n\t"
> + "mul.s %[val7], $f0, $f4 \n\t"
> + "mul.s %[val15], $f1, $f5 \n\t"
> + "mul.s %[val23], $f2, $f4 \n\t"
> + "mul.s %[val31], $f3, $f5 \n\t"
> +
> + : [val0] "=f" (val0), [val7] "=f" (val7),
> + [val8] "=f" (val8), [val15] "=f" (val15),
> + [val16] "=f" (val16), [val23] "=f" (val23),
> + [val24] "=f" (val24), [val31] "=f" (val31)
> + : [tab] "r" (tab)
> + : "$f0", "$f1", "$f2", "$f3", "$f4", "$f5",
> + "$f6", "$f7", "$f8", "$f9", "$f10", "$f11"
> + );
I think you can rewrite this function to use way more fused
multiply-adds/subs.
-Vitor
More information about the ffmpeg-devel
mailing list