[FFmpeg-devel] [PATCH] lavc/aarch64: add a few SIMD function for AAC PS
Clément Bœsch
u at pkh.me
Thu Jun 1 12:13:19 EEST 2017
On Thu, May 25, 2017 at 01:22:22PM -0300, James Almer wrote:
[...]
> > +function ff_ps_stereo_interpolate_ipdopd_neon, export=1
> > + movrel x5, ipdopd_factors
> > + ld1 {v20.4S}, [x5]
> > + ld1 {v0.4S,v1.4S}, [x2]
> > + ld1 {v6.4S,v7.4S}, [x3]
> > +1:
> > + ld1 {v2.2S}, [x0]
> > + ld1 {v3.2S}, [x1]
> > + dup v2.2D, v2.D[0]
> > + dup v3.2D, v3.D[0]
> > + fadd v0.4S, v0.4S, v6.4S
> > + fadd v1.4S, v1.4S, v7.4S
> > + zip1 v16.4S, v0.4S, v0.4S
> > + zip2 v17.4S, v0.4S, v0.4S
> > + zip1 v18.4S, v1.4S, v1.4S
> > + zip2 v19.4S, v1.4S, v1.4S
> > + fmul v4.4S, v2.4S, v16.4S
> > + fmla v4.4S, v3.4S, v17.4S
> > + ext v2.16B, v2.16B, v2.16B, #4
> > + ext v3.16B, v3.16B, v3.16B, #4
>
> > + fmul v5.4S, v2.4S, v18.4S
> > + fmla v5.4S, v3.4S, v19.4S
> > + fmla v4.4S, v5.4S, v20.4S
>
> You could make ipdopd_factors be 0, INT32_MIN, 0, INT32_MIN then replace
> the fmla with eor + fadd.
> No idea if that will actually be faster, though.
I'll check that when benchmarking.
Here is a new version adding ff_ps_hybrid_analysis_neon.
It was very fun to write, but there are some weirdness:
- filter lane is 8 but we're reading only 6 (I suppose that for
performance though)
- this part is strange:
INT64FLOAT sum_re = (INT64FLOAT)filter[i][6][0] * in[6][0];
INT64FLOAT sum_im = (INT64FLOAT)filter[i][6][0] * in[6][1];
why isn't it using the im part of the filter for sum_im?
I'll post some benchmark later.
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-lavc-aarch64-add-a-few-SIMD-function-for-AAC-PS.patch
Type: text/x-diff
Size: 14974 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170601/0d1c3bb6/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170601/0d1c3bb6/attachment.sig>
More information about the ffmpeg-devel
mailing list