[FFmpeg-devel] [PATCH] Mimic decoder
Michael Niedermayer
michaelni
Sun Mar 16 03:46:54 CET 2008
On Sat, Mar 15, 2008 at 06:47:03PM -0300, Ramiro Polla wrote:
> Hello,
>
> [...]
>>>>> /*
>>>>> * vlc_decode_block
>>>>> *
>>>>> * De-serialize (reconstruct) a variable length coded 8x8 block.
>>>>> */
>>>>> static int vlc_decode_block(MimicContext *ctx, DCTELEM *block, int
>>>>> num_coeffs)
>>>>> {
>>>>> unsigned int pos;
>>>>>
>>>>> memset(block, 0, 64 * sizeof(DCTELEM));
>>>> clear_blocks() in dsp
>>> clear_blocks() clears 6*64*sizeof(DCTELEM). On mimic it's only
>>> 64*sizeof(DCTELEM). Is there another function that will clear only what I
>>> need?
>> Hmm, not that i remember, anyway you can move the memset to after the idct
>> that way it will be in the cache when its cleared.
>
> Unless I misunderstood you, the blocks are on the stack, so there's no
> reason to clear them after the IDCT. If it's ok I'll leave that memset
> there.
Stacks and alignment and gcc ...
[...]
>
>>>> and
>>>> value= 1 - (1<<num_bits);
>>>> if(num_bits>1)
>>>> value+= get_bits(num_bits-1);
>>>> if(get_bits1())
>>>> value -= value;
>>>> block[ctx->scantable.permutated[pos]] = value;
>>>> Or if you think the table matters speedwise then make it static dont
>>>> duplicate it in every decoder instance.
>>> ref calc static
>>> mimic.o : 51868 51304 52192
>>> avg time: 1.692 1.712 1.728
>>>
>>> That's very odd... Shouldn't the static table be at least as fast as the
>>> current code?
>> Yes
>> If you want to debug it the asm output from gcc would be a start but i
>> doubt its worth the time spend.
>
> With a bigger sample, I got
> ref calc static
> avg time: 6.688 6.932 6.744
And the standard deviations? Mean is only one side of the story ...
>
>>> Attached are the 2 patches I used to test. (lut_calc.diff and
>>> lut_static.diff). Anyways, most FFmpeg stuff introduced a considerable
>>> speedup over the original code, so I'm tempted to apply one of these two.
>>> Could I use the hardcoded tables ifdef there?
>> I dont see much sense in it. I mean the calc code doesnt need the table at
>> all and needs around 100 byte while the table needs 1024.
>
> If it's ok I'd prefer to keep the current code with vlcdec_lookup in
> MimicContext.
You can keep it if you merge the *qscale into the table as its not
const then anymore ...
>
>> [...]
>>> if(vlc == -1)
>>> return 0;
>>> if(!vlc) /* end-of-block code */
>>> return 1;
>> maybe
>> if(vlc<=0)
>> return !vlc;
>> would be faster (no iam not suggesting this if its not faster)
>
> It's also slower (1.7%), so I'll just leave it as is.
hmm the end-of-block check should be before the invalid check at least
because it will be more often true.
also as you already tried various, what about
if(vlc<=0)
return vlc;
with the others adjusted accordingly
or
pos += vlc & (256+15);
if(pos >= 64)
return 0;
(that is leaving the end of block check but merging the vlc==-1 check with
the pos >=64 check)
This one should be faster in theory of course gcc is unpredictable
[...]
> /* FFmpeg's IDCT behaves somewhat different from the original code, so
> * a factor of 4 was added to the input */
>
> coeff = ctx->vlcdec_lookup[num_bits][value];
> if(pos<3)
> coeff <<= 4;
> else /* TODO Use >> 10 instead of / 1001 */
> coeff = (coeff * qscale) / 1001;
Can you elaborate a little about these 1024 vs. 1001 and
"different from the original" ?
[...]
> /**
> * Internal helper-function used to initialize
> * the lookup-table used by the VLC-decoder.
> */
> static void initialize_vlcdec_lookup(int8_t lookup_tbl[8][128])
> {
> int i, j;
>
> for(i = 1 ; i < 8 ; i++) {
> int first = (1<<i)-1;
> int last = 1<<(i-1);
> int cur = first;
cur - first is redundant remove one please
[...]
> ctx->picture.data[0] = NULL;
> if(avctx->get_buffer(avctx, &ctx->picture)) {
> av_log(avctx, AV_LOG_ERROR, "get_buffer() failed\n");
> return -1;
> }
reference=1 is missing before get_buffer()
[...]
> AVCodec mimic_decoder = {
> "mimic",
> CODEC_TYPE_VIDEO,
> CODEC_ID_MIMIC,
> sizeof(MimicContext),
> mimic_decode_init,
> NULL,
> mimic_decode_end,
> mimic_decode_frame
> };
CODEC_CAP_DR1 is missing
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Good people do not need laws to tell them to act responsibly, while bad
people will find a way around the laws. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080316/ad133b58/attachment.pgp>
More information about the ffmpeg-devel
mailing list