[Ffmpeg-devel] [RFC] svq1 very slow encoding

Sat Mar 31 02:57:35 CEST 2007

On Thu, 29 Mar 2007, Loren Merritt wrote:
>
> 65% of the cpu time was spent on one line. Clearly a candidate for simd.
>
> Patch makes the encode 2.3x faster on a athlon64. Additional speedups I
> tried but didn't include here: using inline instead of dsp adds another
> 10%, and 3dnow adds 3%.

static int ssd_int8_vs_int16_mmx(int8_t *pix1, int16_t *pix2, int size){
+    int sum;
+    long i=size;
+    asm volatile(
...
+        "movd %%mm4, %1 \n"
+        :"+r"(i), "=r"(sum)
+        :"r"(pix1), "r"(pix2)
+    );
+    return sum;

Shouldn't that be "+&r"(i)?

On x86-64, could "int sum" be put in a 64-bit register?  Which would
generate something like "movd %mm4, %rax".  Don't have a 64-bit system, but
can you use movd with a 64-bit general purpose register?  If you can, isn't
it still wrong, since %rax will have garbage in the top 32 bits?