[FFmpeg-devel] [PATCH 1/4] lavc/flacenc: add sse4 version of the 16-bit lpc encoder
James Almer
jamrial at gmail.com
Tue Feb 25 04:42:15 CET 2014
On 20/02/14 3:48 PM, James Darnley wrote:
> From 1.8 to 2.4 times faster. Runtime is reduced by 2 to 39%. The
> speed-up generally increases with compression_level.
>
> This lpc encoder is not used with levels < 3 so it provides no speed-up
> in these cases.
> ---
> LICENSE | 1 +
> libavcodec/flacenc.c | 2 +-
> libavcodec/x86/Makefile | 3 +
> libavcodec/x86/flac_dsp_gpl.asm | 83 +++++++++++++++++++++++++++++++++++++++
> libavcodec/x86/flacdsp_init.c | 4 ++
> 5 files changed, 92 insertions(+), 1 deletions(-)
> create mode 100644 libavcodec/x86/flac_dsp_gpl.asm
>
[...]
> +.looplen:
> + pxor m0, m0
> + mov posj, orderq
> + xor negj, negj
> +
> + .looporder:
> + movd m2, [coefsq+posj*4] ; c = coefs[j]
> + SPLATD m2
> + movu m1, [smpq+negj*4-4] ; s = smp[i-j-1]
> + pmulld m1, m2
> + paddd m0, m1 ; p += c * s
PMACSDD m0, m1, m2, m0, m1
Same with the encoder (PMACSDQL instead in there). Do it of course with the
unrolling patches as well.
You can then make the functions into macros to get both SSE4 and XOP versions,
as i mentioned in a previous email.
More information about the ffmpeg-devel
mailing list