[FFmpeg-devel] [Patch]x86/hevc : new idct + ASM

Mon Jun 2 14:09:31 CEST 2014

Hi,

On Mon, Jun 2, 2014 at 5:15 AM, Pierre Edouard Lepere <
Pierre-Edouard.Lepere at insa-rennes.fr> wrote:

> Here are our new transforms with a basic dc_add done in ASM. The ASM
> should work in SSE2 and 32bit

So ... We took a slightly different approach in ffvp9, where we make the
decision within the function, this allows more fine-grained sub-idct
decisions than just dc_add or full idct.

Some basic asm comments (not completely reviewed yet):
> +.loop
> +    pxor              m2, m2
> +%if %1 == 2 || (%2 == 8 && %1 <= 4)
> +    movd              m2, [dstq]
      ; load data from source
> +%elif %1 == 4 || (%2 == 8 && %1 <= 8)
> +    movq              m2, [dstq]
      ; load data from source
> +%else
> +    movdqu            m2, [dstq]
      ; load data from source
> +%endif

The pxor is unnecessary.

Who zeroes the coefficient block after use? The idct function, or some
place outside?

> +  ;  CLIPW             m2, m1, [max_pixels_10]                  @TODO
fix seg fault when

Probably alignment-related. Make sure max_pixels_10 is 32-byte aligned. In
particular, this would hurt:

> +SECTION_RODATA

Default alignment is 16.

> +add_8:                  dw 32

This one is 16-byte aligned.

> +add_10:                 dw  8

This one is 4-byte aligned.

> +max_pixels_10:          times 8  dw ((1 << 10)-1)

And thus this one is 8-byte aligned. Re-order the constants (max_pixels_10
first, then the small elements) to fix.

Note also that the 4x4 transform always only ever uses one half of the
register, so make it mmx instead of xmm if you like.

Ronald