[FFmpeg-devel] [PATCH] VP3 DC-only IDCT

Sat Apr 17 04:05:31 CEST 2010

On Apr 16, 2010, at 3:42 PM, Michael Niedermayer wrote:

> On Fri, Apr 16, 2010 at 08:20:44AM -0400, David Conrad wrote:
>> On Mar 13, 2010, at 2:18 PM, Michael Niedermayer wrote:
>> 
>>> On Sat, Mar 13, 2010 at 01:36:20AM -0500, David Conrad wrote:
>>>> Hi,
>>>> 
>>>> This gives 2-4% faster overall decode for normal files.
>>>> 
>>>> Some thoughts:
>>>> I can't think of any shortcuts that could make the IDCT faster with 128-byte simd that don't rely on knowing the last non-zero coefficient.
>>>> 
>>>> Knowing that before calling the idct, you could do a slightly faster IDCT that assumes the right and bottom of the block are all 0. This seems to be significantly faster only for mmx; for sse2 it's nearly a wash between the added check vs. the time saved.
>>>> 
>>>> For an average video, around a third of all idcts are DC-only, a third more could be done with that shortcut (i.e. last_nnz is under 10), and the rest require a full IDCT.
>>>> 
>>>> libtheora only does the 10 element shortcut, not DC-only. It also only has a mmx IDCT.
>>>> 
>>>> I also haven't really looked at whether a DC-only IDCT is beneficial for mpeg codecs, thus the vp3-specific dsputil function.
>>>> 
> 
> [...]
>> @@ -1468,10 +1468,13 @@ static void render_slice(Vp3DecodeContext *s, int slice)
>>                             stride,
>>                             block);
>>                     } else {
>> +                        if (vp3_dequant(s, s->all_fragments + i, plane, 1, block))
>>                         s->dsp.idct_add(
>>                             output_plane + first_pixel,
>>                             stride,
>>                             block);
>> +                        else
> 
> nitpick: {}

Done

>> +                            s->dsp.vp3_idct_dc_add(output_plane + first_pixel, stride, block);
>>                     }
>>                 } else {
>> 
>> diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
>> index 87b64de..606e361 100644
>> --- a/libavcodec/vp3dsp.c
>> +++ b/libavcodec/vp3dsp.c
> 
>> @@ -223,6 +223,25 @@ void ff_vp3_idct_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*
>>     idct(dest, line_size, block, 2);
>> }
>> 
>> +void ff_vp3_idct_dc_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*align 16*/){
> 
> const block

Added

> +    const uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
>> +    int i, dc = block[0];
>> +    dc = (46341*dc)>>16;
> 
>> +    dc = (46341*dc)>>16;
>> +    dc = (dc + 8) >> 4;
> 
> mergeable

Done

> rest ok as far as iam concerened but its maintained by others ...

ARM stuff was OKed by Mans, so I went ahead and applied.