[FFmpeg-devel] [Patch]x86/hevc : new idct + ASM
Pierre Edouard Lepere
Pierre-Edouard.Lepere at insa-rennes.fr
Wed Jun 4 13:21:05 CEST 2014
here's a new patch with the suggestions :
- uses less registers
- 4x4 function uses MMX
- uses immediates when applicable
Regards,
Pierre-Edouard Lepere
----- Mail original -----
De: "James Almer" <jamrial at gmail.com>
À: "FFmpeg development discussions and patches" <ffmpeg-devel at ffmpeg.org>
Envoyé: Lundi 2 Juin 2014 22:32:09
Objet: Re: [FFmpeg-devel] [Patch]x86/hevc : new idct + ASM
On 02/06/14 6:15 AM, Pierre Edouard Lepere wrote:
> +%macro TRANSFORM_DC_ADD 2
> +cglobal hevc_put_transform%1x%1_dc_add_%2, 4, 6, 4, dst, coeffs, stride, col_limit, temp
4, 5, 4. You're using only one temp reg, not two.
> + xor tempw, tempw
No need for this. The mov below should clear the reg. Same with the "xor tempq, tempq" and
"pxor m2, m2" a couple instructions below.
> + mov tempw, [coeffsq]
> + add tempw, 1
> + sar tempw, 1
> + add tempw, [add_%2]
Why use constants for a single value when you can use immediates?
%if %2 == 8
add tempw, 32
%else
add tempw, 8
%endif
> + sar tempw, 14-%2
> + movd m0, tempd
> + punpcklwd m0, m0
> + pshufd m0, m0, 0
Use SPLATW here. It will come in handy if you use mmx registers as Ronald suggested for
the 4x4 case. Just make sure to declare the functions as mmxext and not mmx as the latter
doesn't have pshuf* instructions and will instead expand into four punpck* instructions.
> + pxor m1, m1
> + xor tempq, tempq
> + mov tempd, %1
> +.loop
> + pxor m2, m2
> +%if %1 == 2 || (%2 == 8 && %1 <= 4)
There doesn't seem to be a %1 == 2 case.
> + movd m2, [dstq] ; load data from source
> +%elif %1 == 4 || (%2 == 8 && %1 <= 8)
> + movq m2, [dstq] ; load data from source
> +%else
> + movdqu m2, [dstq] ; load data from source
You can use movu and movh here. They will expand to movdqu/movq and movq/movd depending
if you're using mmx or xmm registers.
something like this:
%if %2 == 8 && %1 <= mmsize/2
movh m2,[dstq]
%else
movu m2,[dstq]
%endif
Same for the store version at the end of the function.
This only if you go with mmx registers for the 4x4 case, of course.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-added-new-idct-and-first-idct-asm.patch
Type: text/x-patch
Size: 26816 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140604/acd8e080/attachment.bin>
More information about the ffmpeg-devel
mailing list