[FFmpeg-devel] [PATCH] avcodec/x86/hevc: fix luma 12b overflow
Henrik Gramner
henrik at gramner.com
Mon Feb 26 00:30:21 EET 2024
On Sun, Feb 25, 2024 at 5:42 PM Ronald S. Bultje <rsbultje at gmail.com> wrote:
> + mova m13, [pw_8]
> + paddw m10, m12, m12
> + paddw m12, m10 ; 9 * (q0 - p0) - 3 * ( q1 - p1 )
> paddw m12, m13; + 8
Memory operand
> + paddw m10, m13, m13
> + paddw m13, m10 ; abs(9 * (q0 - p0) - 3 * ( q1 - p1 ))
> + paddw m13, [pw_8]
[...]
> + paddw m13, m12, m12
> + paddw m13, m12 ; 3*abs(m12)
> + paddw m13, [pw_8]
Another minor improvement would be to reorder the adds like (x + x) +
(x + 8) instead of ((x + x) + x) + 8 to allow for more
instruction-level parallelism.
More information about the ffmpeg-devel
mailing list