[FFmpeg-devel] [PATCH] sws/aarch64: add ff_hscale_8_to_15_neon
Clément Bœsch
u at pkh.me
Thu Mar 24 14:40:49 CET 2016
On Thu, Mar 24, 2016 at 09:35:01AM -0400, Ronald S. Bultje wrote:
> Hi,
>
> On Mar 24, 2016 8:28 AM, "Clément Bœsch" <u at pkh.me> wrote:
> >
> > From: Clément Bœsch <clement at stupeflix.com>
> >
> > ./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
> bench=start,scale=1024x1024,bench=stop -f null -
> >
> > before: t:0.489726 avg:0.489883 max:0.491852 min:0.489482
> > after: t:0.256515 avg:0.256458 max:0.256999 min:0.253755
> > ---
> > Changes:
> > - FIX: not using the v8-v15 registers
> > - writing directly from the SIMD register (thx Martin)
> > - misc reordering
> >
> > I'm looking at the vscale part now.
> > ---
> > libswscale/aarch64/Makefile | 6 +++--
> > libswscale/aarch64/hscale.S | 59
> +++++++++++++++++++++++++++++++++++++++++++
> > libswscale/aarch64/swscale.c | 37 +++++++++++++++++++++++++++
> > libswscale/swscale.c | 2 ++
> > libswscale/swscale_internal.h | 1 +
> > libswscale/utils.c | 4 ++-
> > 6 files changed, 106 insertions(+), 3 deletions(-)
> > create mode 100644 libswscale/aarch64/hscale.S
> > create mode 100644 libswscale/aarch64/swscale.c
> Do you intend to create special versions for specific filter widths (e.g.
> x86 has special versions for filter_width=4 and 8). That helped speed up
> the default filters (bicubic) a little more.
>
> This version looks OK already for the default case.
>
I don't need these cases immediately (my use case is filter size of 11 and
26), so no plan so far. I'm actually looking at yuv2planeX_8 to get more
impact on that specific case.
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160324/0150db1b/attachment.sig>
More information about the ffmpeg-devel
mailing list