[Ffmpeg-devel] [PATCH] Some MMX optimizations for Chinese AVS
Michael Niedermayer
michaelni
Fri Jul 28 21:42:33 CEST 2006
Hi
On Fri, Jul 28, 2006 at 07:57:41PM +0200, Stefan Gehrer wrote:
[...]
> @@ -2779,6 +2795,8 @@
> c->idct_permutation_type= FF_PARTTRANS_IDCT_PERM;
> }
> #endif
> + }else if(idct_algo==FF_IDCT_H264){
> + c->idct_permutation_type= FF_TRANSPOSE_IDCT_PERM;
cavs idct != h.264 idct IIRC
[...]
> +static const uint64_t ff_pw_4 __attribute__ ((aligned(8))) = 0x0004000400040004ULL;
> +static const uint64_t ff_pw_5 __attribute__ ((aligned(8))) = 0x0005000500050005ULL;
> +static const uint64_t ff_pw_7 __attribute__ ((aligned(8))) = 0x0007000700070007ULL;
> +static const uint64_t ff_pw_42 __attribute__ ((aligned(8))) = 0x002A002A002A002AULL;
> +static const uint64_t ff_pw_64 __attribute__ ((aligned(8))) = 0x0040004000400040ULL;
> +static const uint64_t ff_pw_96 __attribute__ ((aligned(8))) = 0x0060006000600060ULL;
DECLARE_ALIGNED_8 should be used here
[...]
> + "psllw $1, %%mm4 \n\t" /* mm4 = 2*src7 */
> + "psllw $1, %%mm3 \n\t" /* mm3 = 2*src1 */
> + "psllw $1, %%mm6 \n\t" /* mm6 = 2*src5 */
> + "psllw $1, %%mm1 \n\t" /* mm1 = 2*src3 */
i think paddw is faster then psllw $1 on some cpus and equaly fast on the
rest
[...]
> +static void cavs_idct8_add_mmx(uint8_t *dst, int16_t *block, int stride)
> +{
> + int i;
> + int16_t __attribute__ ((aligned(8))) b2[64];
> +
> + for(i=0; i<2; i++){
> + uint64_t tmp;
this should be aligned
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is
More information about the ffmpeg-devel
mailing list