[FFmpeg-devel] [PATCH] Optimization of original IFF codec
Sebastian Vater
cdgs.basty
Mon Apr 26 22:06:42 CEST 2010
Hi Mans!
M?ns Rullg?rd a ?crit :
> Sebastian Vater <cdgs.basty at googlemail.com> writes:
>
>> Btw, you brought me to a nice idea with your complaints...I could
>> precalculate all these values for each plane in decode_init and then
>> just memcpy it in decodeplane8/24 to local stack, what do you think of this?
>>
>
> Skip the memcpy and make the table static const.
>
>
>> This will yield in 8 (planes)*4 (uint32_t's)*16 (sizeof (struct lut)) =
>> 512 bytes of tables for decodeplane8
>>
>
> 512 bytes is nothing to worry about.
>
>
>> and 24 (planes)*4 (uint32_t's)*16 (sizeof (struct lut))*4
>> (lut[0123]) = 6144 bytes.
>>
>
> 6k isn't a lot either. Just store it statically.
>
>
Bad news here...
Tried almost everything, the new code is not faster than the old one I
had before. :-(
It just wastes memory for gain of nothing.
I tried:
const uint32_t lut[16];
memcpy (lut, &decodeplane8_tab[plane], 16 * sizeof(uint32_t));
In best case as fast as the original. Usually slower.
Then I tried:
const uint32_t *lut = &decodeplane8_tab[plane];
Results are same as above.
Finally I tried without local stack copy as above:
const uint32_t v = decodeplane8_tab[plane][get_bits(&gb, 4)];
AV_WN32A(dst+i, AV_RN32A(dst+i) | v);
This is the slowest of them all...
Please don't ask why, but it's not worth the hassle. I think discarding
the table and keep it the way as I submitted it in the patch is the
best. :-(
Here is the code, how I initialize these tables:
#define DECODEPLANE8(plane) {0x0000000, \
0x1000000 << plane, \
0x0010000 << plane, \
0x1010000 << plane, \
0x0000100 << plane, \
0x1000100 << plane, \
0x0010100 << plane, \
0x1010100 << plane, \
0x0000001 << plane, \
0x1000001 << plane, \
0x0010001 << plane, \
0x1010001 << plane, \
0x0000101 << plane, \
0x1000101 << plane, \
0x0010101 << plane, \
0x1010101 << plane} \
// 8 planes * 4-bit mask
static const uint32_t decodeplane8_tab[8][16] = {DECODEPLANE8(0), \
DECODEPLANE8(1), \
DECODEPLANE8(2), \
DECODEPLANE8(3), \
DECODEPLANE8(4), \
DECODEPLANE8(5), \
DECODEPLANE8(6), \
DECODEPLANE8(7)};
// 24 planes * 4 lookup tables each * 4-bit mask
#define DECODEPLANE24(plane) {{0, \
0, \
0, \
0, \
0, \
0, \
0, \
0, \
1 << plane, \
1 << plane, \
1 << plane, \
1 << plane, \
1 << plane, \
1 << plane, \
1 << plane, \
1 << plane}, \
{0, \
0, \
0, \
0, \
1 << plane, \
1 << plane, \
1 << plane, \
1 << plane, \
0, \
0, \
0, \
0, \
1 << plane, \
1 << plane, \
1 << plane, \
1 << plane}, \
{0, \
0, \
1 << plane, \
1 << plane, \
0, \
0, \
1 << plane, \
1 << plane, \
0, \
0, \
1 << plane, \
1 << plane, \
0, \
0, \
1 << plane, \
1 << plane}, \
{0, \
1 << plane, \
0, \
1 << plane, \
0, \
1 << plane, \
0, \
1 << plane, \
0, \
1 << plane, \
0, \
1 << plane, \
0, \
1 << plane, \
0, \
1 << plane}}
static const uint32_t decodeplane24_tab[24][4][16] = {DECODEPLANE24( 0), \
DECODEPLANE24( 1), \
DECODEPLANE24( 2), \
DECODEPLANE24( 3), \
DECODEPLANE24( 4), \
DECODEPLANE24( 5), \
DECODEPLANE24( 6), \
DECODEPLANE24( 7), \
DECODEPLANE24( 8), \
DECODEPLANE24( 9), \
DECODEPLANE24(10), \
DECODEPLANE24(11), \
DECODEPLANE24(12), \
DECODEPLANE24(13), \
DECODEPLANE24(14), \
DECODEPLANE24(15), \
DECODEPLANE24(16), \
DECODEPLANE24(17), \
DECODEPLANE24(18), \
DECODEPLANE24(19), \
DECODEPLANE24(20), \
DECODEPLANE24(21), \
DECODEPLANE24(22), \
DECODEPLANE24(23)}; \
--
Best regards,
:-) Basty/CDGS (-:
Warum ich spirituell bin? Ganz einfach, weil ich lieber nach
der Formel des Weltfriedens statt nach der Weltformel suche.
More information about the ffmpeg-devel
mailing list