[FFmpeg-devel] [PATCH 02/10] x86: dcadsp: implement SSE lfe_dir
James Almer
jamrial at gmail.com
Sun Apr 6 18:35:42 CEST 2014
On 06/04/14 1:27 PM, Hendrik Leppkes wrote:
> On Sun, Apr 6, 2014 at 5:34 PM, Timothy Gu <timothygu99 at gmail.com> wrote:
>> On Apr 6, 2014 8:24 AM, "Hendrik Leppkes" <h.leppkes at gmail.com> wrote:
>>>
>>> On Fri, Feb 14, 2014 at 5:00 PM, Christophe Gisquet
>>> <christophe.gisquet at gmail.com> wrote:
>>>> Results for Arrandale/Windows:
>>>> 32: 1670 -> 316
>>>> 64: 728 -> 298
>>>> ---
>>>> libavcodec/x86/dcadsp.asm | 87
>> ++++++++++++++++++++++++++++++++++++++++++++
>>>> libavcodec/x86/dcadsp_init.c | 4 ++
>>>> 2 files changed, 91 insertions(+)
>>>>
>>>> diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm
>>>> index a0995c9..f4149d2 100644
>>>> --- a/libavcodec/x86/dcadsp.asm
>>>> +++ b/libavcodec/x86/dcadsp.asm
>>>> @@ -88,3 +88,90 @@ INT8X8_FMUL_INT32
>>>>
>>>> INIT_XMM sse4
>>>> INT8X8_FMUL_INT32
>>>> +
>>>> +; %1=v0/v1 %2=in1 %3=in2
>>>> +%macro FIR_LOOP 2-3
>>>> +.loop%1:
>>>> +%define va m1
>>>> +%define vb m2
>>>> +%if %1
>>>> +%define OFFSET 0
>>>> +%else
>>>> +%define OFFSET NUM_COEF*count
>>>> +%endif
>>>> +; for v0, incrementint and for v1, decrementing
>>>> + mova va, [cf0q + OFFSET]
>>>> + mova vb, [cf0q + OFFSET + 4*NUM_COEF]
>>>> +%if %0 == 3
>>>> + mova m4, [cf0q + OFFSET + mmsize]
>>>> + mova m0, [cf0q + OFFSET + 4*NUM_COEF + mmsize]
>>>> +%endif
>>>> + mulps va, %2
>>>> + mulps vb, %2
>>>> +%if %0 == 3
>>>> + mulps m4, %3
>>>> + mulps m0, %3
>>>> + addps va, m4
>>>> + addps vb, m0
>>>> +%endif
>>>> + ; va = va1 va2 va3 va4
>>>> + ; vb = vb1 vb2 vb3 vb4
>>>> +%if %1
>>>> + SWAP va, vb
>>>> +%endif
>>>> + mova m4, va
>>>> + unpcklps va, vb ; va3 vb3 va4 vb4
>>>> + unpckhps m4, vb ; va1 vb1 va2 vb2
>>>> + addps m4, va ; va1+3 vb1+3 va2+4 vb2+4
>>>> + movhlps vb, m4 ; va1+3 vb1+3
>>>> + addps vb, m4 ; va0..4 vb0..4
>>>> + movh [outq + count], vb
>>>
>>> I got a complaint about a crash on a SSE-only system, and the disasm I
>>> got from the user was pointing at this exact line:
>>>
>>> ......
>>> 6A2F283F movaps xmm4,xmm1
>>> 6A2F2842 unpcklps xmm1,xmm2
>>> 6A2F2845 unpckhps xmm4,xmm2
>>> 6A2F2848 addps xmm4,xmm1
>>> 6A2F284B movhlps xmm2,xmm4
>>> 6A2F284E addps xmm2,xmm4
>>> 6A2F2851 movq mmword ptr [eax+ecx],xmm2 <<<
>>>
>>> The "movh" generates a movq, which according to my quick research
>>> seems to be SSE2-only, and causes an illegal instruction on the users
>>> CPU.
>>
>> That should probably be movlps. See
>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=b5161908e06b4497bf663510fb495ba97a6fd2b5
>> .
>>
>
> I sent a patch based on that, thanks for remembering the earlier case. :)
Ideally, this should be fixed on x86inc so movh expands to movq or movlps
depending on the requested instruction set, so this doesn't happen again.
And afaik, movq using XMM registers is what's not available on SSE.
The problem is not that the dest operand is a memory address, but that
integer instructions didn't get the 128bits treatment until SSE2.
More information about the ffmpeg-devel
mailing list