[FFmpeg-devel] [PATCH] ARM: NEON optimised float_to_int16
Måns Rullgård
mans
Thu Dec 18 01:33:04 CET 2008
"Ian Caulfield" <ian.caulfield at gmail.com> writes:
> 2008/12/17 Mans Rullgard <mans at mansr.com>:
>> + vld1.64 {d0-d1}, [r1,:128]!
>> + vcvt.s32.f32 q0, q0, #16
>> + vshrn.s32 d5, q9, #16
>> + vld1.64 {d2-d3}, [r1,:128]!
>
> Is there any particular reason not to use this?
>
> vld1.64 {d0-d3}, [r1,:128]!
The 4-register load uses 2 issue cycles, only one of which can
dual-issue. 2-register load uses only one cycle and can dual-issue
with either the preceding or following instruction. Splitting the
loads is often faster in situations like this with good opportunities
for dual-issue. I didn't benchmark this particular case though.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list