[FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon
Michael Niedermayer
michael at niedermayer.cc
Tue Mar 1 17:18:36 CET 2016
- Previous message: [FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon
- Next message: [FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
On Tue, Mar 01, 2016 at 11:11:36AM +0100, Clément Bœsch wrote:
> On Mon, Feb 29, 2016 at 10:55:49AM +0100, Clément Bœsch wrote:
> > From: Clément Bœsch <clement at stupeflix.com>
> >
> > ---
> > Changes since latest version:
> > - remove unused 32-bit path
> > - make 16-bit path more accurate by mirroring the MMX code (still not bitexact)
> > - the code as originally trying to process 2 lines at a time to save chroma pre
> > mult computations and avoid re-reading the whole line; for some reason, this
> > actually made the code around twice slower, for twice the complexity.
> > dropping that complexity was a win-win.
> > ---
> > libswscale/aarch64/Makefile | 3 +
> > libswscale/aarch64/swscale_unscaled.c | 132 ++++++++++++++++++++++
> > libswscale/aarch64/yuv2rgb_neon.S | 207 ++++++++++++++++++++++++++++++++++
> > libswscale/swscale_internal.h | 1 +
> > libswscale/swscale_unscaled.c | 2 +
> > 5 files changed, 345 insertions(+)
> > create mode 100644 libswscale/aarch64/Makefile
> > create mode 100644 libswscale/aarch64/swscale_unscaled.c
> > create mode 100644 libswscale/aarch64/yuv2rgb_neon.S
> >
>
> Random benchmark on Hikey (Cortex-A53):
>
> ./ffmpeg -nostats -f lavfi -i testsrc2=s=uhd2160:d=1 -vf format=yuv420p,bench=start,format=rgba,bench=stop -f null -
>
> (yuv420p to rgba in 3840x2160)
>
> before:
> [bench @ 0x2edfe1e0] t:0.181514 avg:0.181514 max:0.181514 min:0.181514
> [bench @ 0x2edfe1e0] t:0.178870 avg:0.180192 max:0.181514 min:0.178870
> [bench @ 0x2edfe1e0] t:0.164448 avg:0.174944 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164801 avg:0.172408 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164635 avg:0.170853 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164756 avg:0.169837 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164784 avg:0.169115 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164413 avg:0.168527 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164760 avg:0.168109 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164647 avg:0.167762 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164698 avg:0.167484 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164600 avg:0.167243 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164498 avg:0.167032 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164765 avg:0.166870 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164613 avg:0.166720 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164781 avg:0.166598 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164489 avg:0.166474 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164432 avg:0.166361 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164540 avg:0.166265 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164524 avg:0.166178 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165147 avg:0.166129 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165484 avg:0.166099 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165703 avg:0.166082 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165643 avg:0.166064 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165294 avg:0.166033 max:0.181514 min:0.164413
>
> after:
> [bench @ 0x16d871e0] t:0.042296 avg:0.042296 max:0.042296 min:0.042296
> [bench @ 0x16d871e0] t:0.041986 avg:0.042141 max:0.042296 min:0.041986
> [bench @ 0x16d871e0] t:0.027298 avg:0.037193 max:0.042296 min:0.027298
> [bench @ 0x16d871e0] t:0.027388 avg:0.034742 max:0.042296 min:0.027298
> [bench @ 0x16d871e0] t:0.027383 avg:0.033270 max:0.042296 min:0.027298
> [bench @ 0x16d871e0] t:0.027366 avg:0.032286 max:0.042296 min:0.027298
> [bench @ 0x16d871e0] t:0.027225 avg:0.031563 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027685 avg:0.031078 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027246 avg:0.030652 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027363 avg:0.030323 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027449 avg:0.030062 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027582 avg:0.029855 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027374 avg:0.029664 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027429 avg:0.029505 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027275 avg:0.029356 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027573 avg:0.029244 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027219 avg:0.029125 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027392 avg:0.029029 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027720 avg:0.028960 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027449 avg:0.028884 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027473 avg:0.028817 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027444 avg:0.028755 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027535 avg:0.028702 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027607 avg:0.028656 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027476 avg:0.028609 max:0.042296 min:0.027219
LGTM
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Dictatorship: All citizens are under surveillance, all their steps and
actions recorded, for the politicians to enforce control.
Democracy: All politicians are under surveillance, all their steps and
actions recorded, for the citizens to enforce control.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160301/81b9b821/attachment.sig>
- Previous message: [FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon
- Next message: [FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the ffmpeg-devel
mailing list