[FFmpeg-devel] [PATCH] x86: hevc_mc: better register allocation
James Almer
jamrial at gmail.com
Sat May 17 20:13:43 CEST 2014
On 17/05/14 11:58 AM, Christophe Gisquet wrote:
> Hi,
>
> this is more a proof of concept to show that the register allocation
> can be improved. This is the first simple example I found, albeit used
> only in a few cases.
>
> Benchmark under Win64:
> before:
> 3872 decicycles in a32, 32761 runs, 7 skips
> 2194 decicycles in a16, 32766 runs, 2 skips
>
> after:
> 3767 decicycles in a32, 32765 runs, 3 skips
> 2119 decicycles in a16, 32767 runs, 1 skips
>
[...]
> diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm
> index 1fae38c..89bbecd 100644
> --- a/libavcodec/x86/hevc_mc.asm
> +++ b/libavcodec/x86/hevc_mc.asm
> @@ -1098,19 +1098,24 @@ cglobal hevc_put_hevc_bi_qpel_hv%1_%2, 9, 11, 16, dst, dststride, src, srcstride
> %endmacro
>
> %macro WEIGHTING_FUNCS 2
> -cglobal hevc_put_hevc_uni_w%1_%2, 8, 10, 11, dst, dststride, src, srcstride, height, denom, wx, ox, shift
> - lea shiftd, [denomd+14-%2] ; shift = 14 - bitd + denom
> - shl oxd, %2-8 ; ox << (bitd - 8)
> - movd m2, wxd ; WX
> - movd m3, oxd ; OX
> - movd m4, shiftd ; shift
> +cglobal hevc_put_hevc_uni_w%1_%2, 4, 5, 7, dst, dststride, src, srcstride, height, denom, wx, ox
Even before your refactor, the function wasn't even using 11 xmm regs,
or 10 gprs for that matter.
There are tons of functions in this file requesting >10 gp/xmm registers
but ultimately using less than that. This is especially bad for win64
where the xmm regs need to be clobbered.
More information about the ffmpeg-devel
mailing list