[FFmpeg-devel] [PATCH] ARM: NEON optimised float_to_int16

Thu Dec 18 01:33:04 CET 2008

"Ian Caulfield" <ian.caulfield at gmail.com> writes:

> 2008/12/17 Mans Rullgard <mans at mansr.com>:
>> +        vld1.64         {d0-d1},  [r1,:128]!
>> +        vcvt.s32.f32    q0,  q0,  #16
>> +        vshrn.s32       d5,  q9,  #16
>> +        vld1.64         {d2-d3},  [r1,:128]!
>
> Is there any particular reason not to use this?
>
> vld1.64     {d0-d3},  [r1,:128]!

The 4-register load uses 2 issue cycles, only one of which can
dual-issue.  2-register load uses only one cycle and can dual-issue
with either the preceding or following instruction.  Splitting the
loads is often faster in situations like this with good opportunities
for dual-issue.  I didn't benchmark this particular case though.

-- 
M?ns Rullg?rd
mans at mansr.com