[FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats

Thu Mar 31 08:19:44 CEST 2016

On Wed, Mar 30, 2016 at 11:36:34PM +0200, Benoit Fouet wrote:
> Hi,

Hi Benoit,

> 
> Le 26/03/2016 13:05, Matthieu Bouron a écrit :
> >On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer <michael at niedermayer.cc
> >>>wrote:
> >>>On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote:
> >>>> >From: Matthieu Bouron<matthieu.bouron at stupeflix.com>
> >>>> >
> >>>> >---
> >>>> >  libswscale/arm/yuv2rgb_neon.S | 89
> >>>++++++++++++-------------------------------
> >>>> >  1 file changed, 24 insertions(+), 65 deletions(-)
> >>>
> >>>breaks build
> >>>
> >>>  make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/
> >>>--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon
> >>>-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux
> >>>--enable-cross-compile && make -j12
> >>>
> >>>CC      libavutil/arm/float_dsp_init_arm.o
> >>>src/libswscale/arm/yuv2rgb_neon.S: Assembler messages:
> >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
> >>>instruction should be in IT block -- `subeq r6,r6,r0'
> >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
> >>>instruction should be in IT block -- `addne r6,r7'
> >>>
> >[...]
> >
> >Patch updated with the relevant it instructions added. It still does build
> >on my rpi2 setup but is not tested on the same setup as yours.
> >Can you confirm it builds/works on your setup ?
> >
> >If it works, i will send an updated version of the next patch (07/10) to
> >resolve the conflicts.
> >
> >Matthieu
> >
> >0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch
> >
> >
> > From 7b3affff405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001
> >From: Matthieu Bouron<matthieu.bouron at stupeflix.com>
> >Date: Wed, 23 Mar 2016 11:26:13 +0000
> >Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
> >  for the yuv420p and nv{12,21} formats
> >
> >---
> >  libswscale/arm/yuv2rgb_neon.S | 92 +++++++++++++------------------------------
> >  1 file changed, 27 insertions(+), 65 deletions(-)
> >
> >diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
> >index ef7b0a6..6aeccae 100644
> >--- a/libswscale/arm/yuv2rgb_neon.S
> >+++ b/libswscale/arm/yuv2rgb_neon.S
> >@@ -105,16 +105,6 @@
> >      compute_16px        r2, d14, d15, \ofmt
> >  .endm
> >-.macro process_2l_16px ofmt
> >-    compute_premult     d28, d29, d30, d31
> >-
> >-    vld1.8              {q7}, [r4]!                                    @ first line of luma
> >-    compute_16px        r2, d14, d15, \ofmt
> >-
> >-    vld1.8              {q7}, [r12]!                                   @ second line of luma
> >-    compute_16px        r11, d14, d15, \ofmt
> >-.endm
> >-
> >  .macro load_args_nvx
> >      push                {r4-r12, lr}
> >      vpush               {q4-q7}
> >@@ -127,13 +117,9 @@
> >      ldr                 r10,[sp, #128]                                 @ r10 = y_coeff
> >      vdup.16             d0, r10                                        @ d0  = y_coeff
> >      vld1.16             {d1}, [r8]                                     @ d1  = *table
> >-    add                 r11, r2, r3                                    @ r11 = dst + linesize (dst2)
> >-    add                 r12, r4, r5                                    @ r12 = srcY + linesizeY (srcY2)
> 
> Nit: this lets r11 and r12 unused by the NV conversions. It should be
> possible not to push/pop them
> If not (which I would certainly understand), what would you think about
> moving the registers save out of the 'load_args_*' macro?
> It seems weird to have all the push/vpush that are not factored, and the
> pop/vpop that is done in only one place, at the end of each function.

Thanks for the review, I unfortunately dropped this part of the patch set,
processing only one line at a time proved to be slower on devices other
than the rpi2. (I will keep your remark in mind if I ever switch back to
processing only one line at a time for all formats).

The v2 patch set is in reply of the following thread:
https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/192272.html

Would you mind taking a look at it ?

Matthieu

[...]