[FFmpeg-devel] [PATCH] VP3 DC-only IDCT
David Conrad
lessen42
Sat Apr 17 04:05:31 CEST 2010
On Apr 16, 2010, at 3:42 PM, Michael Niedermayer wrote:
> On Fri, Apr 16, 2010 at 08:20:44AM -0400, David Conrad wrote:
>> On Mar 13, 2010, at 2:18 PM, Michael Niedermayer wrote:
>>
>>> On Sat, Mar 13, 2010 at 01:36:20AM -0500, David Conrad wrote:
>>>> Hi,
>>>>
>>>> This gives 2-4% faster overall decode for normal files.
>>>>
>>>> Some thoughts:
>>>> I can't think of any shortcuts that could make the IDCT faster with 128-byte simd that don't rely on knowing the last non-zero coefficient.
>>>>
>>>> Knowing that before calling the idct, you could do a slightly faster IDCT that assumes the right and bottom of the block are all 0. This seems to be significantly faster only for mmx; for sse2 it's nearly a wash between the added check vs. the time saved.
>>>>
>>>> For an average video, around a third of all idcts are DC-only, a third more could be done with that shortcut (i.e. last_nnz is under 10), and the rest require a full IDCT.
>>>>
>>>> libtheora only does the 10 element shortcut, not DC-only. It also only has a mmx IDCT.
>>>>
>>>> I also haven't really looked at whether a DC-only IDCT is beneficial for mpeg codecs, thus the vp3-specific dsputil function.
>>>>
>
> [...]
>> @@ -1468,10 +1468,13 @@ static void render_slice(Vp3DecodeContext *s, int slice)
>> stride,
>> block);
>> } else {
>> + if (vp3_dequant(s, s->all_fragments + i, plane, 1, block))
>> s->dsp.idct_add(
>> output_plane + first_pixel,
>> stride,
>> block);
>> + else
>
> nitpick: {}
Done
>> + s->dsp.vp3_idct_dc_add(output_plane + first_pixel, stride, block);
>> }
>> } else {
>>
>> diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
>> index 87b64de..606e361 100644
>> --- a/libavcodec/vp3dsp.c
>> +++ b/libavcodec/vp3dsp.c
>
>> @@ -223,6 +223,25 @@ void ff_vp3_idct_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*
>> idct(dest, line_size, block, 2);
>> }
>>
>> +void ff_vp3_idct_dc_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*align 16*/){
>
> const block
Added
> + const uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
>> + int i, dc = block[0];
>> + dc = (46341*dc)>>16;
>
>> + dc = (46341*dc)>>16;
>> + dc = (dc + 8) >> 4;
>
> mergeable
Done
> rest ok as far as iam concerened but its maintained by others ...
ARM stuff was OKed by Mans, so I went ahead and applied.
More information about the ffmpeg-devel
mailing list