[FFmpeg-devel] [Patch]x86/hevc : new idct + ASM
Ronald S. Bultje
rsbultje at gmail.com
Mon Jun 2 14:09:31 CEST 2014
Hi,
On Mon, Jun 2, 2014 at 5:15 AM, Pierre Edouard Lepere <
Pierre-Edouard.Lepere at insa-rennes.fr> wrote:
> Here are our new transforms with a basic dc_add done in ASM. The ASM
> should work in SSE2 and 32bit
So ... We took a slightly different approach in ffvp9, where we make the
decision within the function, this allows more fine-grained sub-idct
decisions than just dc_add or full idct.
Some basic asm comments (not completely reviewed yet):
> +.loop
> + pxor m2, m2
> +%if %1 == 2 || (%2 == 8 && %1 <= 4)
> + movd m2, [dstq]
; load data from source
> +%elif %1 == 4 || (%2 == 8 && %1 <= 8)
> + movq m2, [dstq]
; load data from source
> +%else
> + movdqu m2, [dstq]
; load data from source
> +%endif
The pxor is unnecessary.
Who zeroes the coefficient block after use? The idct function, or some
place outside?
> + ; CLIPW m2, m1, [max_pixels_10] @TODO
fix seg fault when
Probably alignment-related. Make sure max_pixels_10 is 32-byte aligned. In
particular, this would hurt:
> +SECTION_RODATA
Default alignment is 16.
> +add_8: dw 32
This one is 16-byte aligned.
> +add_10: dw 8
This one is 4-byte aligned.
> +max_pixels_10: times 8 dw ((1 << 10)-1)
And thus this one is 8-byte aligned. Re-order the constants (max_pixels_10
first, then the small elements) to fix.
Note also that the 4x4 transform always only ever uses one half of the
register, so make it mmx instead of xmm if you like.
Ronald
More information about the ffmpeg-devel
mailing list