[FFmpeg-devel] [PATCH v3] avcodec/mathops: Optimize generic mid_pred function

Wed Mar 15 12:09:13 EET 2023

Michael Niedermayer <michael at niedermayer.cc> 于2023年3月8日周三 04:45写道：
>
> On Tue, Mar 07, 2023 at 05:08:27PM +0800, Junxian Zhu wrote:
> > From: Junxian Zhu <zhujunxian at oss.cipunited.com>
> >
> > Rewrite mid_pred function in generic mathops.h, reduce branch jump to improve performance. And because nowadays new version compiler can compile enough short asmbbely code as handwritting in these function, so remove specified optimized mips inline asmbbely mathops.h.
>
> as you write, that it improves performance
> what speed effect does this have exactly?
> thx
>

I tested the performance, using this code
```
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define FFMIN(a, b) ( a>b ? b : a )
#define FFMAX(a, b) ( a>b ? a : b )

int mid_pred(int a, int b, int c)
{
#if OLD
    if(a>b){
        if(c>b){
            if(c>a) b=a;
            else    b=c;
        }
    }else{
        if(b>c){
            if(c>a) b=c;
            else    b=a;
        }
    }
    return b;
#else
    int t0,t1,t2,t3;
    t0 = (a > b) ? b : a ;
    t1 = (a > b) ? a : b ;
    t2 = (t0 > c) ? t0 : c;
    t3 = (t1 > t2) ? t2 : t1;
    return t3;
#endif
}

int main() {
    int a[1024], b[1024], c[1024], d[1024];

    srand(time(NULL));
    for(int i=0; i<1024; i++) {
        a[i] = rand();
        b[i] = rand();
        c[i] = rand();
     }
     for (int j=0; j<1e7+rand()%2; j++)
         for(int i=0; i<1024; i++)
             d[i] = mid_pred(a[i], b[i], c[i]);

     printf("%d, %d\n", d[rand()%1024], j);
}
```

On MacOS 13.2 with Apple M1:
The old code              the new code
2.1s                            2.3s

On Cavium ThunderX / arm64 (GCC 10.2.1 -O3)
The old code              the new code
52.7s                          37.8s

On Loongson 3A4000/mips64el (GCC 10.2.1 -O3)
The old code              the new code
90s                             5s

On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 10.2.1 -O3)
The old code              the new code
14.4s                          15.4s

On SF19A2890/MIPS interAptiv (GCC 10.2.1 -O3)
The old code              the new code
314s                           39.3s

On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 12.2.0 -O3)
The old code              the new code
14.4s                          8.8s

On sifive,bullet0/rv64imafdc  (GCC 12.2.0 -O3, 1e6 times instead of 1e7)
The old code              the new code
11.9s                          15.2s

On Freescale i.MX53/ARMv7 Processor rev 5 (v7l)  (GCC 12.2.0 -O3, 1e6
times instead of 1e7)
The old code              the new code
24.1s                          15.7s

On POWER8 (architected), altivec supported, BIG ENDIAN, ppc64  (GCC 12.2.0 -O3)
The old code              the new code
43.1s                          50.8s

On POWER8 (architected), altivec supported, LITTLE ENDIAN, ppc64el
(GCC 12.2.0 -O3)
The old code              the new code
7.8s                            4.7s

On PA8900 (Shortfin) PA-RISC (GCC 12.2.0 -O3 1e6 times instead of 1e7)
The old code              the new code
39.9s                          47.2s

On IBM/S390 aka s390x (GCC 12.2.0 -O3)
The old code              the new code
82.2s                          30.8s

On Intel(R)  Itanium(R)  Processor 9320  (GCC 12.2.0 -O3)
The old code              the new code
89.5s                          78.1s

Cavium Octeon III V0.2  FPU V0.0 /mipsel  (GCC 12.2.0 -O3)
The old code              the new code
117.5s                        118.5s


> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> It is dangerous to be right in matters on which the established authorities
> are wrong. -- Voltaire
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".