[FFmpeg-devel] [PATCH] x86: hevc_mc: better register allocation
Michael Niedermayer
michaelni at gmx.at
Sun May 18 16:37:32 CEST 2014
On Sun, May 18, 2014 at 12:34:04AM +0200, Christophe Gisquet wrote:
> Hi,
>
> 2014-05-18 0:20 GMT+02:00 Christophe Gisquet <christophe.gisquet at gmail.com>:
> > Patch needs to be rewritten
>
> Here's an attempt, only tested (compilation+fate) on Win64.
>
> --
> Christophe
> hevc_mc.asm | 33 +++++++++++++++++++++++----------
> 1 file changed, 23 insertions(+), 10 deletions(-)
> 511373719ed90f69129591756547918c1d555ac5 0001-x86-hevc-dsp-better-register-allocation.patch
> From bcb6875c8c795486227b636b75a43d93408f207f Mon Sep 17 00:00:00 2001
> From: Christophe Gisquet <christophe.gisquet at gmail.com>
> Date: Sat, 17 May 2014 12:22:39 +0200
> Subject: [PATCH] x86: hevc dsp: better register allocation
>
> The xmm reg count was incorrect, and manual loading of the gprs
> furthermore allow to noticeable reduce the number needed.
>
> The modified function is used in weighted prediction, so only a few
> samples like WP_A_Toshiba_3.bit exhibit a change. For this one and
> Win64 (24 and 48 widths removed because of too few occurrences):
>
> before:
> 3872 decicycles in a32, 32761 runs, 7 skips
> 2194 decicycles in a16, 32766 runs, 2 skips
>
> after:
> 3767 decicycles in a32, 32765 runs, 3 skips
> 2119 decicycles in a16, 32767 runs, 1 skips
> ---
> libavcodec/x86/hevc_mc.asm | 33 +++++++++++++++++++++++----------
> 1 file changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm
> index 1fae38c..c7e8d07 100644
> --- a/libavcodec/x86/hevc_mc.asm
> +++ b/libavcodec/x86/hevc_mc.asm
> @@ -1098,19 +1098,32 @@ cglobal hevc_put_hevc_bi_qpel_hv%1_%2, 9, 11, 16, dst, dststride, src, srcstride
> %endmacro
>
> %macro WEIGHTING_FUNCS 2
> -cglobal hevc_put_hevc_uni_w%1_%2, 8, 10, 11, dst, dststride, src, srcstride, height, denom, wx, ox, shift
> - lea shiftd, [denomd+14-%2] ; shift = 14 - bitd + denom
> - shl oxd, %2-8 ; ox << (bitd - 8)
> - movd m2, wxd ; WX
> - movd m3, oxd ; OX
> - movd m4, shiftd ; shift
> +%if WIN64
> +cglobal hevc_put_hevc_uni_w%1_%2, 4, 5, 7, dst, dststride, src, srcstride, height, denom, wx, ox
> + mov r4d, denomm
> +%define SHIFT r4d
> +%else
> +cglobal hevc_put_hevc_uni_w%1_%2, 6, 6, 7, dst, dststride, src, srcstride, height, denom, wx, ox
> +%define SHIFT denomd
> +%endif
this is getting a little bit ugly ...
anyway review left to james & ronald & anyone else who likes to ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140518/1a10d060/attachment.asc>
More information about the ffmpeg-devel
mailing list