[FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats
Matthieu Bouron
matthieu.bouron at gmail.com
Thu Mar 31 08:19:44 CEST 2016
On Wed, Mar 30, 2016 at 11:36:34PM +0200, Benoit Fouet wrote:
> Hi,
Hi Benoit,
>
> Le 26/03/2016 13:05, Matthieu Bouron a écrit :
> >On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer <michael at niedermayer.cc
> >>>wrote:
> >>>On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote:
> >>>> >From: Matthieu Bouron<matthieu.bouron at stupeflix.com>
> >>>> >
> >>>> >---
> >>>> > libswscale/arm/yuv2rgb_neon.S | 89
> >>>++++++++++++-------------------------------
> >>>> > 1 file changed, 24 insertions(+), 65 deletions(-)
> >>>
> >>>breaks build
> >>>
> >>> make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/
> >>>--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon
> >>>-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux
> >>>--enable-cross-compile && make -j12
> >>>
> >>>CC libavutil/arm/float_dsp_init_arm.o
> >>>src/libswscale/arm/yuv2rgb_neon.S: Assembler messages:
> >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
> >>>instruction should be in IT block -- `subeq r6,r6,r0'
> >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
> >>>instruction should be in IT block -- `addne r6,r7'
> >>>
> >[...]
> >
> >Patch updated with the relevant it instructions added. It still does build
> >on my rpi2 setup but is not tested on the same setup as yours.
> >Can you confirm it builds/works on your setup ?
> >
> >If it works, i will send an updated version of the next patch (07/10) to
> >resolve the conflicts.
> >
> >Matthieu
> >
> >0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch
> >
> >
> > From 7b3affff405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001
> >From: Matthieu Bouron<matthieu.bouron at stupeflix.com>
> >Date: Wed, 23 Mar 2016 11:26:13 +0000
> >Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
> > for the yuv420p and nv{12,21} formats
> >
> >---
> > libswscale/arm/yuv2rgb_neon.S | 92 +++++++++++++------------------------------
> > 1 file changed, 27 insertions(+), 65 deletions(-)
> >
> >diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
> >index ef7b0a6..6aeccae 100644
> >--- a/libswscale/arm/yuv2rgb_neon.S
> >+++ b/libswscale/arm/yuv2rgb_neon.S
> >@@ -105,16 +105,6 @@
> > compute_16px r2, d14, d15, \ofmt
> > .endm
> >-.macro process_2l_16px ofmt
> >- compute_premult d28, d29, d30, d31
> >-
> >- vld1.8 {q7}, [r4]! @ first line of luma
> >- compute_16px r2, d14, d15, \ofmt
> >-
> >- vld1.8 {q7}, [r12]! @ second line of luma
> >- compute_16px r11, d14, d15, \ofmt
> >-.endm
> >-
> > .macro load_args_nvx
> > push {r4-r12, lr}
> > vpush {q4-q7}
> >@@ -127,13 +117,9 @@
> > ldr r10,[sp, #128] @ r10 = y_coeff
> > vdup.16 d0, r10 @ d0 = y_coeff
> > vld1.16 {d1}, [r8] @ d1 = *table
> >- add r11, r2, r3 @ r11 = dst + linesize (dst2)
> >- add r12, r4, r5 @ r12 = srcY + linesizeY (srcY2)
>
> Nit: this lets r11 and r12 unused by the NV conversions. It should be
> possible not to push/pop them
> If not (which I would certainly understand), what would you think about
> moving the registers save out of the 'load_args_*' macro?
> It seems weird to have all the push/vpush that are not factored, and the
> pop/vpop that is done in only one place, at the end of each function.
Thanks for the review, I unfortunately dropped this part of the patch set,
processing only one line at a time proved to be slower on devices other
than the rpi2. (I will keep your remark in mind if I ever switch back to
processing only one line at a time for all formats).
The v2 patch set is in reply of the following thread:
https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/192272.html
Would you mind taking a look at it ?
Matthieu
[...]
More information about the ffmpeg-devel
mailing list