[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter

Mon Jul 5 23:02:01 CEST 2010

On Mon, Jul 5, 2010 at 1:30 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Mon, Jul 5, 2010 at 1:44 AM, David Conrad <lessen42 at gmail.com> wrote:
>> Updated to patch cleanly, compile, and added mmx/sse2 versions
> [..]
>> +SECTION_RODATA
>> +pw_4: times 8 dw 4
>> +pw_5: times 8 dw 5
>
> cextern pw_4, pw_5 (i.e. use the ones in dsputil_mmx.c) maybe?
>
>> +; low, high (src), zero
>> +%macro UNPACK2 4
>> + ? ?mova ? ? ?m%2, m%3
>> + ? ?punpckh%1 m%3, m%4
>> + ? ?punpckl%1 m%2, m%4
>> +%endmacro
>
> duplicate of SBUTTERFLY in x86util.asm, maybe?
>
>> +%macro STORE_4_WORDS_MMX 6
>> + ? ?movd ? %6, %5
>> +%if mmsize==16
>> + ? ?psrldq %5, 4
>> +%else
>> + ? ?psrlq ?%5, 32
>> +%endif
>> + ? ?mov ? ?%1, %6w
>> + ? ?shr ? ?%6, 16
>> + ? ?mov ? ?%2, %6w
>> + ? ?movd ? %6, %5
>> + ? ?mov ? ?%3, %6w
>> + ? ?shr ? ?%6, 16
>> + ? ?mov ? ?%4, %6w
>> +%endmacro
>
> For VP8 H loopfilter, I save the neighbouring two rows (p1/q1) and
> write the four out as dwords using movd at once from the mm register,
> have you tried that (I'm not asking you to rewrite it if you didn't),
> and if so, is it faster?
>
> (I suppose this isn't very practical because of the SSE4 version below...)
>
>> +%macro STORE_4_WORDS_SSE4 6
>> + ? ?pextrw %1, %5, %6+0
>> + ? ?pextrw %2, %5, %6+1
>> + ? ?pextrw %3, %5, %6+2
>> + ? ?pextrw %4, %5, %6+3
>> +%endmacro
> [..]

I don't recall pextrw being SSE4...

Dark Shikari