[FFmpeg-devel] [PATCH] configure: replace arch loongson with arch extra list loongson
周晓勇
zhouxiaoyong at loongson.cn
Thu May 7 04:25:33 CEST 2015
> -----原始邮件-----
> 发件人: "Michael Niedermayer" <michael at niedermayer.cc>
> 发送时间: 2015年5月6日 星期三
> 收件人: "FFmpeg development discussions and patches" <ffmpeg-devel at ffmpeg.org>
> 抄送: gaoxiang <gaoxiang at loongson.cn>, "孟小甫" <mengxiaofu at loongson.cn>
> 主题: Re: [FFmpeg-devel] [PATCH] configure: replace arch loongson with arch extra list loongson
>
> On Wed, May 06, 2015 at 02:38:21PM +0800, 周晓勇 wrote:
> > From a5031b4c4b97f790a40603cff9a1f45cbb043005 Mon Sep 17 00:00:00 2001
> > From: ZhouXiaoyong <zhouxiaoyong at loongson.cn>
> > Date: Wed, 6 May 2015 14:05:21 +0800
> > Subject: [PATCH] configure: replace arch loongson with arch extra list loongson
> >
> > fate pass when do configure without --cc='ccache gcc' option:
> > ./configure --enable-gpl --enable-pthreads --samples=/home/loongson/fate/
> > --enable-nonfree --enable-version3 --assert-level=2 --cpu=loongson3a
> > --enable-loongson3
>
> with this ARCH_MIPS64 is disabled, is this intended ?
>
ARCH_MIPS64 only be used in libavutil/mips/intereadwrite.h for AV_RN32. i mean to not disturb other MIPS64 machines, and Loongson's optimization maybe not compatible for other MIPS64 before tested. as i have no MIPS64 machine expect Loongson3 for testing.
In my personal git-devel branch, i have optimized the other funcs for Loongson-3, such as AV_WN32, AV_RN64, AV_WN64, AV_COPY32, AV_COPY64, AV_SWAP64, AV_ZERO32, AV_ZERO_64.
But, its boost gain little than anticipant. i will do more test to make sure the optimized intreadwrite boost truely, then send u the patch.
> why is "--enable-loongson3" needed when "--cpu=loongson3a" is already
> specified ?
>
no need, i just add on to make sure the SIMD optimization enabled.
> and fate still fails
> time ./configure --enable-gpl --enable-pthreads --samples=/home/loongson/fate/ --enable-nonfree --enable-version3 --assert-level=2 --cpu=loongson3a --enable-loongson3
> real 4m48.779s
> user 4m13.918s
> sys 0m40.020s
>
> time make -j4
> real 19m31.114s
> user 57m52.785s
> sys 2m52.359s
>
> make -j5 fate-vsynth1-rv10 fate-vsynth1-svq1 fate-amrwb-23k85 fate-dss-lp fate-lavf-avi
>
> --- ./tests/ref/fate/dss-lp 2015-05-06 01:16:58.238387245 +0800
> +++ tests/data/fate/dss-lp 2015-05-06 20:15:23.060689405 +0800
> @@ -1,31 +1,31 @@
> #tb 0: 1/8000
> -0, 0, 0, 240, 480, 0xf1107658
> -0, 240, 240, 240, 480, 0x50dee179
> -0, 480, 480, 240, 480, 0x40090802
> -0, 720, 720, 240, 480, 0x3ef9f6ff
> -0, 960, 960, 240, 480, 0x5b7df231
> -0, 1200, 1200, 240, 480, 0xe266efd1
> -0, 1440, 1440, 240, 480, 0xfbe6e658
> -0, 1680, 1680, 240, 480, 0xde84f311
> -0, 1920, 1920, 240, 480, 0x5854ec2f
> -0, 2160, 2160, 240, 480, 0x4901cdea
> -0, 2400, 2400, 240, 480, 0x03f3e619
> -0, 2640, 2640, 240, 480, 0x47abfe87
> -0, 2880, 2880, 240, 480, 0x69dddf34
> -0, 3120, 3120, 240, 480, 0x1cfeee2c
> -0, 3360, 3360, 240, 480, 0x1860ef1c
> -0, 3600, 3600, 240, 480, 0x8f86e8ed
> -0, 3840, 3840, 240, 480, 0x307deaf8
> -0, 4080, 4080, 240, 480, 0xeca7eca0
> -0, 4320, 4320, 240, 480, 0x1835ee1c
> -0, 4560, 4560, 240, 480, 0x6676ed66
> -0, 4800, 4800, 240, 480, 0x49c2fd04
> -0, 5040, 5040, 240, 480, 0xc463db75
> -0, 5280, 5280, 240, 480, 0x1931ed7d
> -0, 5520, 5520, 240, 480, 0xc99ff886
> -0, 5760, 5760, 240, 480, 0xcd3ae8de
> -0, 6000, 6000, 240, 480, 0x2294ecfa
> -0, 6240, 6240, 240, 480, 0xcf5ef14b
> -0, 6480, 6480, 240, 480, 0x6325d4fe
> -0, 6720, 6720, 240, 480, 0x3790dcf2
> -0, 6960, 6960, 240, 480, 0x0fbee6c0
> +0, 0, 0, 240, 480, 0x4f3de452
> +0, 240, 240, 240, 480, 0x55d1f9da
> +0, 480, 480, 240, 480, 0xe887e1f6
> +0, 720, 720, 240, 480, 0xc353f768
> +0, 960, 960, 240, 480, 0x34adebcc
> +0, 1200, 1200, 240, 480, 0x7d67dfa2
> +0, 1440, 1440, 240, 480, 0xc7a4f1f4
> +0, 1680, 1680, 240, 480, 0x549cf083
> +0, 1920, 1920, 240, 480, 0x468dead7
> +0, 2160, 2160, 240, 480, 0x7e6af748
> +0, 2400, 2400, 240, 480, 0x02f20456
> +0, 2640, 2640, 240, 480, 0xb9d5eb37
> +0, 2880, 2880, 240, 480, 0x008cee35
> +0, 3120, 3120, 240, 480, 0xdd13f6c0
> +0, 3360, 3360, 240, 480, 0xaa0df718
> +0, 3600, 3600, 240, 480, 0x0a84ee9c
> +0, 3840, 3840, 240, 480, 0xaccfed94
> +0, 4080, 4080, 240, 480, 0x65c7f1bf
> +0, 4320, 4320, 240, 480, 0xda8cebed
> +0, 4560, 4560, 240, 480, 0x0ea4f747
> +0, 4800, 4800, 240, 480, 0x0feee8a6
> +0, 5040, 5040, 240, 480, 0x65d0de7d
> +0, 5280, 5280, 240, 480, 0xc986f146
> +0, 5520, 5520, 240, 480, 0x7886f3f5
> +0, 5760, 5760, 240, 480, 0x39a6eda8
> +0, 6000, 6000, 240, 480, 0x636af0b0
> +0, 6240, 6240, 240, 480, 0xdd2bfec3
> +0, 6480, 6480, 240, 480, 0x1baddcc4
> +0, 6720, 6720, 240, 480, 0x12cbef82
> +0, 6960, 6960, 240, 480, 0xbd11ee44
> Test dss-lp failed. Look at tests/data/fate/dss-lp.err for details.
> make: *** [fate-dss-lp] Error 1
> make: *** Waiting for unfinished jobs....
> stddev:32798.91 PSNR: 6.01 MAXDIFF:46621 bytes: 327680/ 327680
> stddev: |32798.91 - 0| >= 2
> Test amrwb-23k85 failed. Look at tests/data/fate/amrwb-23k85.err for details.
> make: *** [fate-amrwb-23k85] Error 1
>
i am working on it.
>
> also without explicitly specifying loongson:
>
> ./configure --enable-gpl --enable-pthreads --samples=/home/loongson/fate/ --enable-version3 --assert-level=2
> ...
> ./libavutil/libm.h:162:76: error: static declaration of ‘round’ follows non-static declaration
> static av_always_inline av_const double round(double x)
> ^
> ./libavutil/libm.h:169:75: error: static declaration of ‘roundf’ follows non-static declaration
> static av_always_inline av_const float roundf(float x)
> ^
> ./libavutil/libm.h:176:76: error: static declaration of ‘trunc’ follows non-static declaration
> static av_always_inline av_const double trunc(double x)
> ^
> ./libavutil/libm.h:183:75: error: static declaration of ‘truncf’ follows non-static declaration
> static av_always_inline av_const float truncf(float x)
> ^
> make: *** [libavdevice/alldevices.o] Error 1
>
> detection of round() failed with this:
>
> /usr/bin/ld: /tmp/ffconf.S1BUH3UB.o: linking mips:isa32r2 module with previous mips:4000 modules
> /usr/bin/ld: failed to merge target specific data of file /tmp/ffconf.S1BUH3UB.o
> /tmp/ffconf.S1BUH3UB.o: In function `foo':
> ffconf.HgZd30xA.c:(.text+0x3c): undefined reference to `round'
> collect2: error: ld returned 1 exit status
>
i will offer patch soon.
>
>
> >
> > Signed-off-by: ZhouXiaoyong <zhouxiaoyong at loongson.cn>
> > ---
> > configure | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/configure b/configure
> > index d3f23c8..0f79874 100755
> > --- a/configure
> > +++ b/configure
> > @@ -1577,6 +1577,9 @@ ARCH_EXT_LIST_MIPS="
> > mipsdspr1
> > mipsdspr2
> > msa
> > +"
> > +
> > +ARCH_EXT_LIST_LOONGSON="
> > loongson3
>
> why would this be in a seperate list ?
> the various ARM variants are also not in seperate lists
>
Loogson have developed more useful MMI(Multi Media Instruct), imgtec may call it vector instructs. in a long term, we will fill the ARCH_EXT_LIST_LOONGSON or ARCH_EXT_LIST_LOONGSON_SIMD with flags like MMX, AVX, SSE...
You may not known that various Loongson-3 CPU cores have more instructs than MIPS64R2, so a separated list is better.
By the way, ARCH_EXT_LIST_ARM is existed yet, isn't it.
1561 ARCH_EXT_LIST_ARM="
1562 armv5te
1563 armv6
1564 armv6t2
1565 armv8
1566 neon
1567 vfp
1568 vfpv3
1569 setend
1570 "
More information about the ffmpeg-devel
mailing list