[FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon
Clément Bœsch
u at pkh.me
Tue Mar 1 11:11:36 CET 2016
On Mon, Feb 29, 2016 at 10:55:49AM +0100, Clément Bœsch wrote:
> From: Clément Bœsch <clement at stupeflix.com>
>
> ---
> Changes since latest version:
> - remove unused 32-bit path
> - make 16-bit path more accurate by mirroring the MMX code (still not bitexact)
> - the code as originally trying to process 2 lines at a time to save chroma pre
> mult computations and avoid re-reading the whole line; for some reason, this
> actually made the code around twice slower, for twice the complexity.
> dropping that complexity was a win-win.
> ---
> libswscale/aarch64/Makefile | 3 +
> libswscale/aarch64/swscale_unscaled.c | 132 ++++++++++++++++++++++
> libswscale/aarch64/yuv2rgb_neon.S | 207 ++++++++++++++++++++++++++++++++++
> libswscale/swscale_internal.h | 1 +
> libswscale/swscale_unscaled.c | 2 +
> 5 files changed, 345 insertions(+)
> create mode 100644 libswscale/aarch64/Makefile
> create mode 100644 libswscale/aarch64/swscale_unscaled.c
> create mode 100644 libswscale/aarch64/yuv2rgb_neon.S
>
Random benchmark on Hikey (Cortex-A53):
./ffmpeg -nostats -f lavfi -i testsrc2=s=uhd2160:d=1 -vf format=yuv420p,bench=start,format=rgba,bench=stop -f null -
(yuv420p to rgba in 3840x2160)
before:
[bench @ 0x2edfe1e0] t:0.181514 avg:0.181514 max:0.181514 min:0.181514
[bench @ 0x2edfe1e0] t:0.178870 avg:0.180192 max:0.181514 min:0.178870
[bench @ 0x2edfe1e0] t:0.164448 avg:0.174944 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164801 avg:0.172408 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164635 avg:0.170853 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164756 avg:0.169837 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164784 avg:0.169115 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164413 avg:0.168527 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164760 avg:0.168109 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164647 avg:0.167762 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164698 avg:0.167484 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164600 avg:0.167243 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164498 avg:0.167032 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164765 avg:0.166870 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164613 avg:0.166720 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164781 avg:0.166598 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164489 avg:0.166474 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164432 avg:0.166361 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164540 avg:0.166265 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164524 avg:0.166178 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165147 avg:0.166129 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165484 avg:0.166099 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165703 avg:0.166082 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165643 avg:0.166064 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165294 avg:0.166033 max:0.181514 min:0.164413
after:
[bench @ 0x16d871e0] t:0.042296 avg:0.042296 max:0.042296 min:0.042296
[bench @ 0x16d871e0] t:0.041986 avg:0.042141 max:0.042296 min:0.041986
[bench @ 0x16d871e0] t:0.027298 avg:0.037193 max:0.042296 min:0.027298
[bench @ 0x16d871e0] t:0.027388 avg:0.034742 max:0.042296 min:0.027298
[bench @ 0x16d871e0] t:0.027383 avg:0.033270 max:0.042296 min:0.027298
[bench @ 0x16d871e0] t:0.027366 avg:0.032286 max:0.042296 min:0.027298
[bench @ 0x16d871e0] t:0.027225 avg:0.031563 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027685 avg:0.031078 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027246 avg:0.030652 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027363 avg:0.030323 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027449 avg:0.030062 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027582 avg:0.029855 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027374 avg:0.029664 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027429 avg:0.029505 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027275 avg:0.029356 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027573 avg:0.029244 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027219 avg:0.029125 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027392 avg:0.029029 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027720 avg:0.028960 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027449 avg:0.028884 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027473 avg:0.028817 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027444 avg:0.028755 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027535 avg:0.028702 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027607 avg:0.028656 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027476 avg:0.028609 max:0.042296 min:0.027219
[...]
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160301/41bb3f73/attachment.sig>
More information about the ffmpeg-devel
mailing list