[FFmpeg-devel] [PATCH 2/7] ARM: NEON optimised simple_idct
Ian Caulfield
ian.caulfield
Sat Dec 6 01:17:35 CET 2008
2008/12/5 Mans Rullgard <mans at mansr.com>:
> +function idct_col4_st8_neon
> + vshr.s32 q2, q3, #COL_SHIFT
> + vshr.s32 q3, q4, #COL_SHIFT
> + vmovn.i32 d2, q2
> + vshr.s32 q4, q7, #COL_SHIFT
> + vmovn.i32 d3, q3
> + vshr.s32 q5, q8, #COL_SHIFT
> + vqmovun.s16 d2, q1
> + vmovn.i32 d4, q4
> + vshr.s32 q6, q14, #COL_SHIFT
> + vst1.32 {d2[0]}, [r0,:32], r1
> + vmovn.i32 d5, q5
> + vshr.s32 q7, q13, #COL_SHIFT
> + vst1.32 {d2[1]}, [r0,:32], r1
> + vmovn.i32 d6, q6
> + vqmovun.s16 d3, q2
I'm probably missing something fundamental here, but could the
sequence of instructions
vadd.i32 q3, q11, q9 (in col4_neon)
vadd.i32 q4, q12, q10 (in col4_neon)
vshr.s32 q2, q3, #COL_SHIFT
vshr.s32 q3, q4, #COL_SHIFT
vmovn.i32 d2, q2
vmovn.i32 d3, q3
vqmovun.s16 d2, q1
be replaced by something like
vaddhn.s32 d6, q11, q9
vaddhn.s32 d7, q12,q10
vqshrun.s16 d2, q3, #COL_SHIFT-16
?
Ian
More information about the ffmpeg-devel
mailing list