[FFmpeg-devel] [PATCH v2 8/9] swscale/arm/yuv2rgb: save a few instructions by processing the luma line interleaved
Benoit Fouet
benoit.fouet at free.fr
Thu Mar 31 11:17:43 CEST 2016
Hi,
On 28/03/2016 21:19, Matthieu Bouron wrote:
> ---
> libswscale/arm/yuv2rgb_neon.S | 88 +++++++++++++++++--------------------------
> 1 file changed, 34 insertions(+), 54 deletions(-)
>
> diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
> index 124d7d3..6b911c8 100644
> --- a/libswscale/arm/yuv2rgb_neon.S
> +++ b/libswscale/arm/yuv2rgb_neon.S
>
> [...]
>
> @@ -94,25 +67,29 @@
> .ifc \ofmt,bgra
> compute_rgba d8, d7, d6, d9, d12, d11, d10, d13
> .endif
> +
> + vzip.8 d6, d10
> + vzip.8 d7, d11
> + vzip.8 d8, d12
> + vzip.8 d9, d13
Adding a comment to explain the resulting interleaving would be nice
> vst4.8 {q3, q4}, [\dst,:128]!
> vst4.8 {q5, q6}, [\dst,:128]!
> -
> .endm
>
> .macro process_1l ofmt
> - compute_premult d28, d29, d30, d31
> - vld1.8 {q7}, [r4]!
> - compute r2, d14, d15, \ofmt
> + compute_premult
> + vld2.8 {d14, d15}, [r4]!
> + compute r2, \ofmt
> .endm
>
> .macro process_2l ofmt
> - compute_premult d28, d29, d30, d31
> + compute_premult
>
> - vld1.8 {q7}, [r4]! @ first line of luma
> - compute r2, d14, d15, \ofmt
> + vld2.8 {d14, d15}, [r4]! @ q7 = Y (interleaved)
> + compute r2, \ofmt
>
> - vld1.8 {q7}, [r12]! @ second line of luma
> - compute r11, d14, d15, \ofmt
> + vld2.8 {d14, d15}, [r12]! @ q7 = Y (interleaved)
> + compute r11, \ofmt
> .endm
>
What about adding a level of macro here? Something like:
.macro process_1l_internal ofmt src_addr res
compute_premult
vld2.8 {d14, d15}, [\src_addr]!
compute \res, \ofmt
.endm
(again, the naming could be changed, according to your own taste :-) )
This way, we would get:
.macro process_1l ofmt
process_1l_internal \ofmt, r4, r2
.endm
.macro process_2l ofmt
process_1l_internal \ofmt, r4, r2
process_1l_internal \ofmt, r12, r11
.endm
--
Ben
More information about the ffmpeg-devel
mailing list