[FFmpeg-devel] [PATCH] Dsputilize some functions from APE decode 1/2 - Altivec implementation
Guillaume Poirier
gpoirier
Sat Jul 5 22:02:38 CEST 2008
Hello,
Le 5 juil. 08 ? 20:18, Kostya a ?crit :
> On Sat, Jul 05, 2008 at 09:17:08PM +0300, Kostya wrote:
>> Here's Altivec version of $subj. SSE2 version will follow next week.
>
> I know, attached patch was appproved but this will give more speedup.
A few comments on your patch:
+static int32_t scalarproduct_int16_altivec(int16_t * v1, int16_t *
v2, int order, const int shift)
+{
+ int i;
+ register vector signed short vec1, *pv;
+ register const vector signed int zero = vec_splat_s32(0);
+ register vector signed int res = vec_splat_s32(0), t;
you may want to use LOAD_ZERO defined in libavcodec/ppc/types_altivec.h
It also defines a zero vector for all types (short, int, unsigned,,....)
+ register vector unsigned int shifts;
+ int32_t ires __attribute__((aligned(16)));
Please don't use directly the GCC extention, please use FFmpeg's macro
DECLARE_ALIGNED_16, which will give:
DECLARE_ALIGNED_16(int32_t, ires)
Also, if I were you, I'd use the types vec_u8_t, vec_u16_t, in order
to shorten the length of your vector types, and also make more
explicit the actual size of data you're manipulating (since C standard
has a pretty vague definition (to say the least) of the size of
fundamental types).
Those were my 2c...
Guillaume
More information about the ffmpeg-devel
mailing list