[Ffmpeg-devel] [RFC] LZO optimization, howto detect builtin memcpy?
Reimar Döffinger
Reimar.Doeffinger
Tue Jan 30 19:36:46 CET 2007
Hello,
attached patch provide some LZO decoding speeding if either the compiler
has a builtin memcpy for fixed-size copies or the architecture can do
unaligned load/stores.
Problem is, I don't know how to properly detect either one.
Any ideas? Or should I just commit attached patch which assumes that a
memcpy builtin is used? Having a UNALIGNED_LOADSTORE or similar define
available would be nice for the stuff in intreadwrite.h, too, I guess
(though there you can at least use the unaligned_16 for gnu compilers,
that would slow things down in this case if unaligned loads/stores are
not supported though).
Greetings,
Reimar D?ffinger
-------------- next part --------------
Index: libavcodec/lzo.c
===================================================================
--- libavcodec/lzo.c (revision 7768)
+++ libavcodec/lzo.c (working copy)
@@ -66,6 +66,19 @@
return cnt;
}
+//#define UNALIGNED_LOADSTORE
+#define BUILTIN_MEMCPY
+#ifdef UNALIGNED_LOADSTORE
+#define COPY2(d, s) *(uint16_t *)(d) = *(uint16_t *)(s);
+#define COPY4(d, s) *(uint32_t *)(d) = *(uint32_t *)(s);
+#elif defined(BUILTIN_MEMCPY)
+#define COPY2(d, s) memcpy(d, s, 2);
+#define COPY4(d, s) memcpy(d, s, 4);
+#else
+#define COPY2(d, s) (d)[0] = (s)[0]; (d)[1] = (s)[1];
+#define COPY4(d, s) (d)[0] = (s)[0]; (d)[1] = (s)[1]; (d)[2] = (s)[2]; (d)[3] = (s)[3];
+#endif
+
/**
* \brief copy bytes from input to output buffer with checking
* \param cnt number of bytes to copy, must be > 0
@@ -82,10 +95,7 @@
c->error |= LZO_OUTPUT_FULL;
}
#if defined(INBUF_PADDED) && defined(OUTBUF_PADDED)
- dst[0] = src[0];
- dst[1] = src[1];
- dst[2] = src[2];
- dst[3] = src[3];
+ COPY4(dst, src);
src += 4;
dst += 4;
cnt -= 4;
@@ -120,22 +130,16 @@
dst += cnt;
} else {
#ifdef OUTBUF_PADDED
- dst[0] = src[0];
- dst[1] = src[1];
- dst[2] = src[2];
- dst[3] = src[3];
+ COPY2(dst, src);
+ COPY2(dst + 2, src + 2);
src += 4;
dst += 4;
cnt -= 4;
if (cnt > 0) {
- dst[0] = src[0];
- dst[1] = src[1];
- dst[2] = src[2];
- dst[3] = src[3];
- dst[4] = src[4];
- dst[5] = src[5];
- dst[6] = src[6];
- dst[7] = src[7];
+ COPY2(dst, src);
+ COPY2(dst + 2, src + 2);
+ COPY2(dst + 4, src + 4);
+ COPY2(dst + 6, src + 6);
src += 8;
dst += 8;
cnt -= 8;
More information about the ffmpeg-devel
mailing list