[FFmpeg-devel] [PATCH v3] mdct15: add assembly optimizations for the 15-point FFT
Henrik Gramner
henrik at gramner.com
Fri Jun 23 03:44:44 EEST 2017
On Fri, Jun 23, 2017 at 12:44 AM, Rostislav Pehlivanov
<atomnuker at gmail.com> wrote:
> +%macro FFT5 3 ; %1 - in_offset, %2 - dst1 (64bit used), %3 - dst2
> + movddup xm0, [inq + 0*16 + 0 + %1] ; in[ 0].re, in[ 0].im, in[ 0].re, in[ 0].im
> + movsd xm1, [inq + 1*16 + 8 + %1] ; in[ 3].re, in[ 3].im, 0, 0
> + movsd xm2, [inq + 2*16 + 16 + %1] ; in[ 5].re, in[ 5].im, in[ 6].re, in[ 6].im
> + movsd xm3, [inq + 4*16 + 8 + %1] ; in[ 8].re, in[ 8].im, in[ 9].re, in[ 9].im
> + movsd xm4, [inq + 6*16 + 0 + %1] ; in[12].re, in[12].im, 0, 0
> +
> + vinsertf128 m0, xm0, 1
> +
> + shufps xm1, xm2, q1010 ; in[ 3].re, in[ 3].im, in[ 6].re, in[ 6].im
> + shufps xm4, xm3, q1010 ; in[12].re, in[12].im, in[ 9].re, in[ 9].im
vbroadcastsd instead of movddup + vinsertf128.
movhps instead of movsd+shufps.
> +%macro BUTTERFLIES_DC 2 ; %1 - exptab_offset, %2 - out
> + movaps m0, [exptabq + %1]
> + vextractf128 xm1, m0, 1
> +
> + mulps xm1, xm10
> + mulps xm0, xm9
mulps xm0, xm9, [exptabq + %1]
mulps xm1, xm10, [exptabq + %1 + 16]
(cross-lane shuffles are slow, avoid them when possible)
> +%macro BUTTERFLIES_AC 2 ; exptab, exptab_offset, src1, src2, src3, out (uses m0-m3)
> + mulps m0, m12, [exptabq + 64*0 + 0*mmsize + %1]
> + mulps m1, m12, [exptabq + 64*0 + 1*mmsize + %1]
> + mulps m2, m13, [exptabq + 64*1 + 0*mmsize + %1]
> + mulps m3, m13, [exptabq + 64*1 + 1*mmsize + %1]
> +
> + shufps m1, m1, q2301
> + shufps m3, m3, q2301
> +
> + addps m0, m1
> + addps m2, m3
> + addps m0, m2
Adding m1 and m3 before shuffling should allow you to remove one
shufps. Might also be beneficial to reorder the multiplies so that m1
and m3 are calculated before m0 and m2.
> +cglobal fft15, 4, 6, 14, out, in, exptab, stride, stride3, stride5
> +%define out0q inq
> + shl strideq, 3
> +
> + movaps m5, [exptabq + 480]
> + vextractf128 xm6, m5, 1
Use two loads instead of a cross-lane shuffle.
More information about the ffmpeg-devel
mailing list