[FFmpeg-devel] [PATCH] VP3 DC-only IDCT
Michael Niedermayer
michaelni
Fri Apr 16 21:42:30 CEST 2010
On Fri, Apr 16, 2010 at 08:20:44AM -0400, David Conrad wrote:
> On Mar 13, 2010, at 2:18 PM, Michael Niedermayer wrote:
>
> > On Sat, Mar 13, 2010 at 01:36:20AM -0500, David Conrad wrote:
> >> Hi,
> >>
> >> This gives 2-4% faster overall decode for normal files.
> >>
> >> Some thoughts:
> >> I can't think of any shortcuts that could make the IDCT faster with 128-byte simd that don't rely on knowing the last non-zero coefficient.
> >>
> >> Knowing that before calling the idct, you could do a slightly faster IDCT that assumes the right and bottom of the block are all 0. This seems to be significantly faster only for mmx; for sse2 it's nearly a wash between the added check vs. the time saved.
> >>
> >> For an average video, around a third of all idcts are DC-only, a third more could be done with that shortcut (i.e. last_nnz is under 10), and the rest require a full IDCT.
> >>
> >> libtheora only does the 10 element shortcut, not DC-only. It also only has a mmx IDCT.
> >>
> >> I also haven't really looked at whether a DC-only IDCT is beneficial for mpeg codecs, thus the vp3-specific dsputil function.
> >>
> >
> > [...]
> >> diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
> >> index 87b64de..606e361 100644
> >> --- a/libavcodec/vp3dsp.c
> >> +++ b/libavcodec/vp3dsp.c
> >> @@ -223,6 +223,25 @@ void ff_vp3_idct_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*
> >> idct(dest, line_size, block, 2);
> >> }
> >>
> >> +void ff_vp3_idct_dc_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*align 16*/){
> >> + const uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
> >> + int i, dc = block[0];
> >
> >> + dc = (46341*dc)>>16;
> >> + dc = (46341*dc)>>16;
> >
> > me searches for a bag to vomit into ...
> > do they do all x>>1 in theora that way or just selected ones?
>
> Every multiplication in the IDCT is immediately followed by cutting the least significant 16 bits.
creepy
>
> > [...]
> >> diff --git a/libavcodec/x86/vp3dsp_mmx.c b/libavcodec/x86/vp3dsp_mmx.c
> >> index fead8e8..e39d0a1 100644
> >> --- a/libavcodec/x86/vp3dsp_mmx.c
> >> +++ b/libavcodec/x86/vp3dsp_mmx.c
> >> @@ -395,3 +395,65 @@ void ff_vp3_idct_add_mmx(uint8_t *dest, int line_size, DCTELEM *block)
> >> ff_vp3_idct_mmx(block);
> >> add_pixels_clamped_mmx(block, dest, line_size);
> >> }
> >> +
> >
> >> +void ff_vp3_idct_dc_add_mmx2(uint8_t *dest, int linesize, DCTELEM *block)
> >> +{
> >> + int dc = block[0];
> >> + dc = (46341*dc)>>16;
> >
> >> + dc = (46341*dc)>>16;
> >> + dc = (dc + 8) >> 4;
> >
> > you can merge these 2
>
> Done
[...]
> @@ -1468,10 +1468,13 @@ static void render_slice(Vp3DecodeContext *s, int slice)
> stride,
> block);
> } else {
> + if (vp3_dequant(s, s->all_fragments + i, plane, 1, block))
> s->dsp.idct_add(
> output_plane + first_pixel,
> stride,
> block);
> + else
nitpick: {}
> + s->dsp.vp3_idct_dc_add(output_plane + first_pixel, stride, block);
> }
> } else {
>
> diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
> index 87b64de..606e361 100644
> --- a/libavcodec/vp3dsp.c
> +++ b/libavcodec/vp3dsp.c
> @@ -223,6 +223,25 @@ void ff_vp3_idct_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*
> idct(dest, line_size, block, 2);
> }
>
> +void ff_vp3_idct_dc_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*align 16*/){
const block
> + const uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
> + int i, dc = block[0];
> + dc = (46341*dc)>>16;
> + dc = (46341*dc)>>16;
> + dc = (dc + 8) >> 4;
mergeable
rest ok as far as iam concerened but its maintained by others ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100416/233af130/attachment.pgp>
More information about the ffmpeg-devel
mailing list