[FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats
Benoit Fouet
benoit.fouet at free.fr
Wed Mar 30 23:36:34 CEST 2016
Hi,
Le 26/03/2016 13:05, Matthieu Bouron a écrit :
> On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer <michael at niedermayer.cc
>> >wrote:
>> >On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote:
>>> > >From: Matthieu Bouron<matthieu.bouron at stupeflix.com>
>>> > >
>>> > >---
>>> > > libswscale/arm/yuv2rgb_neon.S | 89
>> >++++++++++++-------------------------------
>>> > > 1 file changed, 24 insertions(+), 65 deletions(-)
>> >
>> >breaks build
>> >
>> > make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/
>> >--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon
>> >-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux
>> >--enable-cross-compile && make -j12
>> >
>> >CC libavutil/arm/float_dsp_init_arm.o
>> >src/libswscale/arm/yuv2rgb_neon.S: Assembler messages:
>> >src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
>> >instruction should be in IT block -- `subeq r6,r6,r0'
>> >src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
>> >instruction should be in IT block -- `addne r6,r7'
>> >
> [...]
>
> Patch updated with the relevant it instructions added. It still does build
> on my rpi2 setup but is not tested on the same setup as yours.
> Can you confirm it builds/works on your setup ?
>
> If it works, i will send an updated version of the next patch (07/10) to
> resolve the conflicts.
>
> Matthieu
>
> 0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch
>
>
> From 7b3affff405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001
> From: Matthieu Bouron<matthieu.bouron at stupeflix.com>
> Date: Wed, 23 Mar 2016 11:26:13 +0000
> Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
> for the yuv420p and nv{12,21} formats
>
> ---
> libswscale/arm/yuv2rgb_neon.S | 92 +++++++++++++------------------------------
> 1 file changed, 27 insertions(+), 65 deletions(-)
>
> diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
> index ef7b0a6..6aeccae 100644
> --- a/libswscale/arm/yuv2rgb_neon.S
> +++ b/libswscale/arm/yuv2rgb_neon.S
> @@ -105,16 +105,6 @@
> compute_16px r2, d14, d15, \ofmt
> .endm
>
> -.macro process_2l_16px ofmt
> - compute_premult d28, d29, d30, d31
> -
> - vld1.8 {q7}, [r4]! @ first line of luma
> - compute_16px r2, d14, d15, \ofmt
> -
> - vld1.8 {q7}, [r12]! @ second line of luma
> - compute_16px r11, d14, d15, \ofmt
> -.endm
> -
> .macro load_args_nvx
> push {r4-r12, lr}
> vpush {q4-q7}
> @@ -127,13 +117,9 @@
> ldr r10,[sp, #128] @ r10 = y_coeff
> vdup.16 d0, r10 @ d0 = y_coeff
> vld1.16 {d1}, [r8] @ d1 = *table
> - add r11, r2, r3 @ r11 = dst + linesize (dst2)
> - add r12, r4, r5 @ r12 = srcY + linesizeY (srcY2)
Nit: this lets r11 and r12 unused by the NV conversions. It should be
possible not to push/pop them
If not (which I would certainly understand), what would you think about
moving the registers save out of the 'load_args_*' macro?
It seems weird to have all the push/vpush that are not factored, and the
pop/vpop that is done in only one place, at the end of each function.
[snip]
Looks good to me anyway (as well as the remainder of the series).
--
Ben
More information about the ffmpeg-devel
mailing list