[FFmpeg-devel] [PATCH] Dsputilize some functions from APE decode 1/2 - Altivec implementation

Sat Jul 5 22:02:38 CEST 2008

Hello,

Le 5 juil. 08 ? 20:18, Kostya a ?crit :

> On Sat, Jul 05, 2008 at 09:17:08PM +0300, Kostya wrote:
>> Here's Altivec version of $subj. SSE2 version will follow next week.
>
> I know, attached patch was appproved but this will give more speedup.

A few comments on your patch:

+static int32_t scalarproduct_int16_altivec(int16_t * v1, int16_t *  
v2, int order, const int shift)
+{
+    int i;
+    register vector signed short vec1, *pv;
+    register const vector signed int zero = vec_splat_s32(0);
+    register vector signed int res = vec_splat_s32(0), t;

you may want to use LOAD_ZERO defined in libavcodec/ppc/types_altivec.h

It also defines a zero vector for all types (short, int, unsigned,,....)

+    register vector unsigned int shifts;
+    int32_t ires __attribute__((aligned(16)));

Please don't use directly the GCC extention, please use FFmpeg's macro  
DECLARE_ALIGNED_16, which will give:

DECLARE_ALIGNED_16(int32_t, ires)

Also, if I were you, I'd use the types vec_u8_t, vec_u16_t, in order  
to shorten the length of your vector types, and also make more  
explicit the actual size of data you're manipulating (since C standard  
has a pretty vague definition (to say the least) of the size of  
fundamental types).

Those were my 2c...

Guillaume