[FFmpeg-devel] [PATCH] mmx implementation of vc-1 inverse transformations
Victor Pollex
victor.pollex
Mon Jul 7 21:02:28 CEST 2008
Michael Niedermayer schrieb:
> On Thu, Jul 03, 2008 at 02:51:18PM +0200, Victor Pollex wrote:
>
>> Michael Niedermayer schrieb:
>>
>>> On Sat, Jun 28, 2008 at 12:31:41PM +0200, Victor Pollex wrote:
>>>
> [...]
>
>>> [...]
>>>
>>>
>>>
>>>> +#define LOAD_ADD_CLAMP_STORE_8X1(io,t0,t1,r0,r1)\
>>>> + "movq "#io", "#t0"\n\t"\
>>>> + "movq "#t0", "#t1"\n\t"\
>>>> + "punpcklbw %%mm7, "#t0"\n\t"\
>>>> + "punpckhbw %%mm7, "#t1"\n\t"\
>>>> + "paddw "#r0", "#t0"\n\t"\
>>>> + "paddw "#r1", "#t1"\n\t"\
>>>> + "packuswb "#t1", "#t0"\n\t"\
>>>> + "movq "#t0", "#io"\n\t"
>>>>
>>>>
>>> some of the movq seem redundant
>>>
>>>
>> I'm sorry but I don' see it. Although this is now obsolete, I'd still like
>> to know which one is redundant and why.
>>
>
> i thought io was a register, if it is memory then things are different
> sorry
>
>
> [...]
>
> the LOAD4/STORE4 patch is ok
>
> [...]
>
>
>> +/*
>> + postcondition:
>> + dst0 = [15:0](dst0 + src);
>> + dst1 = [15:0](dst1 + src);
>> + dst2 = [15:0](dst2 - src);
>> + dst3 = [15:0](dst3 - src);
>> +*/
>> +#define ADD2SUB2(src, dst0, dst1, dst2, dst3)\
>> + ADD1SUB1(src, dst0, dst2)\
>> + ADD1SUB1(src, dst1, dst3)
>>
>
> This occurs 6 times, thus it safes 6 ADD1SUB1 lines while the macro
> with documentation needs 10 lines thus overall this is a loss, not only
> is it one more macro the reader has to understand it is also more code
> with the macro than without
>
removed macro and replaced all occurences with two ADD1SUB1
>
> [...]
>
>> +/*
>> + precodition:
>> + for all values v in r0, r1, r2, r3: -3971 <= v <= 3971
>> +
>> + postcondition:
>> + r3 = ((17 * (r0 + r2) + (22 * r1 + 10 * r3) + c) >> 3)
>> + r4 = ((17 * (r0 - r2) - (10 * r1 - 22 * r3) + c) >> 3)
>> + r1 = ((17 * (r0 - r2) + (10 * r1 - 22 * r3) + c) >> 3)
>> + r2 = ((17 * (r0 + r2) - (22 * r1 + 10 * r3) + c) >> 3)
>> + r0 undefined
>> + r5 undefined
>> + r6 undefined
>> + r7 undefined
>> +*/
>> +#define TRANSFORM_4X4_ROW(r0,r1,r2,r3,r4,r5,r6,r7,c)\
>> + TRANSPOSE4(r0,r1,r2,r3,r4)\
>> + TRANSFORM_4X4_COMMON(r0,r3,r4,r2,r1,r5,r6,r7,c)\
>> + "paddw "#r4", "#r4"\n\t" /* 2 * (r0 + r2) */\
>> + SUMSUB_BA(r3,r4)\
>> + "paddw "#r1", "#r3"\n\t"\
>> + "paddw "#r7", "#r4"\n\t"\
>> + "paddw "#r0", "#r0"\n\t" /* 2 * (r0 - r2) */\
>> + SUMSUB_BA(r2,r0)\
>> + "paddw "#r5", "#r0"\n\t"\
>> + "paddw "#r6", "#r2"\n\t"\
>> + TRANSPOSE4(r3,r0,r2,r4,r1)
>>
>
> It should be possible to merge one transpose into the scantble (the mpeg1/2/4
> decoder does that too)
>
>
I'm not sure if this should be done as I found the following lines in
decode_sequence_header in vc1.c
if (!v->res_fasttx)
{
v->s.dsp.vc1_inv_trans_8x8 = ff_simple_idct;
v->s.dsp.vc1_inv_trans_8x4 = ff_simple_idct84_add;
v->s.dsp.vc1_inv_trans_4x8 = ff_simple_idct48_add;
v->s.dsp.vc1_inv_trans_4x4 = ff_simple_idct44_add;
}
> [...]
>
>> +/*
>> + postcondition:
>> + r0 = [15:0](2 * r0);
>> + r1 = [15:0](3 * r0);
>> +*/
>> +#define G3X(r0,r1)\
>> + "movq "#r0", "#r1"\n\t" /* r0 */\
>> + "paddw "#r0", "#r0"\n\t" /* 2 * r0 */\
>> + "paddw "#r0", "#r1"\n\t" /* 3 * r0 */
>>
>
> 4 uses, saving 8 lines, macro with docs is 9 lines
>
removed docs
>
> [...]
>
>> +static void vc1_inv_trans_4x4_mmx(uint8_t *dest, int linesize, DCTELEM *block)
>> +{
>> + asm volatile(
>> + LOAD4(0x10,0x00(%0),%%mm0,%%mm1,%%mm2,%%mm3)
>> + TRANSFORM_4X4_ROW(%%mm0,%%mm1,%%mm2,%%mm3,%%mm4,%%mm5,%%mm6,%%mm7,%3)
>> + TRANSFORM_4X4_COL(%%mm3,%%mm4,%%mm1,%%mm2,%%mm0,%%mm5,%%mm6,%%mm7,0x08+%3)
>> + "pxor %%mm7, %%mm7\n\t"
>> + LOAD_ADD_CLAMP_STORE_4X4(%2,%1,%%mm0,%%mm4,%%mm3,%%mm2,%%mm1)
>> + :
>> + : "r"(block), "r"(dest), "r"(linesize), "m"(constants[0])
>> + : "memory"
>>
>
> you are modifying a register (%1) which is just an input, this isnt
> correct, gcc could assume it didnt change ...
> it should be "+r" not "r" and listed with the outputs.
>
> [...]
>
changed it
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vc1_inv_trans_mmx_v5.patch
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080707/ed3b45ef/attachment.asc>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dsputil_v2.patch
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080707/ed3b45ef/attachment.txt>
More information about the ffmpeg-devel
mailing list