[MPlayer-dev-eng] [OT] C-code Optimiation Contest
Raindel Shachar
raindel at techunix.technion.ac.il
Wed Jul 16 00:33:33 CEST 2003
Hi, here is my entry, for Pentium4 only (x15 faster on pentium4, x4 faster
on pentium3):
Compile with the fllowing command:
gcc -O3 -ffast-math -fomit-frame-pointer -mcpu=i686 -march=i686
-malign-double -mrtd -fstrict-aliasing matrix.c multiply_d.c -o matrix &&
strip matrix
multiply_d.c:
---- Cut here -----
/* simple matrix multiply */
#include "multiply_d.h"
void multiply(double a[][DIM], double b[][DIM], double c[][DIM]) {
register double *tmp2; double tmp;
register int j,k,i;
tmp2 = c[0];
for(i=0; i<NUM; i++) {
for(k=0; k<NUM; k++) {
tmp = a[i][k];
for(j=0; j<NUM; j++) {
*tmp2 = *tmp2 + tmp *b[k][j];
tmp2++;
}
tmp2 -= NUM;
}
tmp2 += DIM;
}
}
---- Cut here ----
Leave multiply_d.h as it was before.
Cheers
Shachar
P.S. for these who are looking for a more sophisticated challange: write a
program which calculate the first 10000 digits of Pi as fast as possible.
You may pick any development enviorment (I have written one for Intel's
pentium, in assembly, under dos. IIRC it takes about 100 clocks per
digit - I haven't runned it for ages, and I am on linux right now, but the
algorithm I used doesn't scale well)
More information about the MPlayer-dev-eng
mailing list