[FFmpeg-devel] Indeo3 replacement, take 3
Vitor Sessak
vitor1001
Fri Nov 6 00:39:55 CET 2009
Maxim wrote:
> Hi Michael,
>
> many thanks for your review! Below some question/comments...
>
>> [...]
>>
>>
>>> //! static vq delta tables (initialized at startup)
>>> static int32_t delta_tabs [24][256]; ///< vq delta tables for all 4x4 block modes
>>> static uint8_t selector_tabs[24][256]; ///< each entry says which delta type each code is of
>>>
>>> //! vq delta tables for the mode 10 (8x8 blocks)
>>> static DECLARE_ALIGNED_8(uint64_t, delta_m10_tab[24][256]);
>>>
>>>
>>> /**
>>> * Build decoder delta tables dynamically.
>>> */
>>> static av_cold void build_delta_tables(void)
>>> {
>>> int n, i, j, quads, swap;
>>> int32_t a, b, corr;
>>> int32_t *delta_tab, *quad_tab;
>>> const vqtEntry *rom_tab;
>>> const int8_t *delta_rom;
>>> int8_t *sel_tab;
>>> int64_t *delta_64b;
>>>
>>> for (n = 0; n < 24; n++) {
>>> rom_tab = &vq_delta_rom[n];
>>> delta_rom = rom_tab->delta_tab;
>>>
>>> delta_tab = &delta_tabs [n][0];
>>> sel_tab = &selector_tabs[n][0];
>>> delta_64b = &delta_m10_tab[n][0];
>>>
>>> /* set all code selectors = forbidden */
>>> memset(sel_tab, RLE_FORBIDDEN, 249);
>>>
>>> /* initialize selectors for RLE escape codes (0xF8...0xFF) */
>>> sel_tab[249] = RLE_ESC_F9;
>>> sel_tab[250] = RLE_ESC_FA;
>>> sel_tab[251] = RLE_ESC_FB;
>>> sel_tab[252] = RLE_ESC_FC;
>>> sel_tab[253] = RLE_ESC_FD;
>>> sel_tab[254] = RLE_ESC_FE;
>>> sel_tab[255] = RLE_ESC_FF;
>>>
>>> for (i = 0; i < rom_tab->num_dyads; i++) {
>>> /* get two vq deltas from the ROM table */
>>> a = delta_rom[i * 2];
>>> b = delta_rom[i * 2 + 1];
>>> if (!HAVE_BIGENDIAN)
>>> FFSWAP(int32_t, a, b);
>>>
>>> /* concatenate two vq deltas to form a two-pixel corrector (dyad) */
>>> delta_tab[i] = (a << 8) + b;
>>>
>> iam not sure if i would call this concatenation, considering these are
>> signed variables and can be negative
>>
>> also it seems to me that the delta_rom tables are never used directly
>> but rather just the softSIMD twistet form of them, if so this is a
>> waste of memory, the softSIMD twiseted form could be stored directly
>>
>
> Those "softSIMD twisted" tables are build depends on endianess of the
> particular machine. We need to store two different tables, one for each
> endianess type. This would indeed require less memory but would end up
> in the huge file like "indeo3data.h" from the current svn...
> Moreover the twisted tables are less readable IMHO...
There is also the possibility of doing the following (untested):
#if HAVE_BIGENDIAN
#define SS(a,b) ((a << 8) + (b))
#else
#define SS(a,b) ((b << 8) + (a))
#endif
static const int32_t delta_tabs [24][256] = {
{
SS(0,0), SS(2,2), ....
}
>>
>>> /* those two values are different in the binary decoder, no idea why */
>>> /* patch them in order to make the output bitexact */
>>> requant_tab[1][7] = 10;
>>> requant_tab[4][8] = 10;
>>>
>> are these the only differences in the tables?
>> or just the only used by the files you tested?
>> also from the comment i assume your decoder is bitexact, is it?
>>
> Those tables are hardcoded in Xanim, Windows and Mac libraries therefore
> I could compare them easily...
> The equation above was found from the table values using the "try and
> error" approach...
> And yes, the decoder is bitexact at this moment compared to Win32 output!
>
>> and the code above is not thread safe (a simple solution is to
>> never write to any spot in a global array a value that differs
>> from what you previously wrote there)
>>
>
> What would be a solution for that? The use of hardcoded tables?
I think that static tables are always initialized by the OS with zeros,
so you can do something like (also untested)
> static uint8_t coef1[] = {1, 1, 2, -3, -3, 3, 4, 4};
> static uint8_t coef2[] = {0, 1, 0, 4, 4, 1, 0, 1};
>
> /* some last elements calculated above will have values >= 128 */
> /* pixel values shall never exceed 127 so set them to non-overflowing values */
> /* according with the quantization step of the respective section */
> requant_tab[0][127] = 126;
> requant_tab[1][119] = 118;
> requant_tab[1][120] = 118;
> requant_tab[2][126] = 124;
> requant_tab[2][127] = 124;
> requant_tab[6][124] = 120;
> requant_tab[6][125] = 120;
> requant_tab[6][126] = 120;
> requant_tab[6][127] = 120;
>
> /* those two values are different in the binary decoder, no idea why */
> /* patch them in order to make the output bitexact */
> requant_tab[1][7] = 10;
> requant_tab[4][8] = 10;
>
> for (i = 0; i < 128; i++)
> for(j = 0; j < 8; j++)
> if (!requant_tab[j][i])
> requant_tab[j][i] = (i + coef1[j]) / j * j + coef2[j];
And IMO it should be thread-safe.
>>
>>> skip_bits = 0;
>>> need_resync = 0;
>>>
>>> /* init binary tree decoding */
>>> curr_cell = ctx->cell_stack;
>>>
>>> /* initialize the 1st cell and set its dimensions to whole plane */
>>> curr_cell->xpos = curr_cell->ypos = 0;
>>> curr_cell->width = plane->width >> 2;
>>> curr_cell->height = plane->height >> 2;
>>> curr_cell->tree = 0; // we are in the MC tree now
>>> curr_cell->mv_ptr = 0; // no motion vector = INTRA cell
>>>
>>> /* process the plane's binary tree until it gets exhausted */
>>> while (curr_cell >= ctx->cell_stack) {
>>> if (need_resync && !(get_bits_count(&gb) & 7)) {
>>> skip_bits_long(&gb, skip_bits);
>>> skip_bits = 0;
>>> need_resync = 0;
>>> }
>>> switch (get_bits(&gb, 2)) {
>>> case H_SPLIT:
>>> /* split current cell into two vertical subcells */
>>> prev_cell = curr_cell;
>>> if (curr_cell >= &ctx->cell_stack[CELL_STACK_MAX]) {
>>> av_log(avctx, AV_LOG_ERROR, "Cell stack overflow (corrupted binary tree)!\n");
>>> return -1;
>>> }
>>> DUPLICATE_CELL(curr_cell);
>>> SPLIT_CELL(prev_cell->height, curr_cell->height);
>>> prev_cell->ypos += curr_cell->height;
>>> prev_cell->height -= curr_cell->height;
>>> break;
>>>
>> this function likely can be implemented much cleaner recursively instead of
>> this synthetic stack thing.
>>
>
> Ok, how to prevent the stack in a recursive function from possible
> corruption in the case of damaged binary tree? Check a global counter
> locked by a mutex?
I think it might be easier to pass to the recursive func a parameter
with the maximum recursion depth and decrement it before calling itself
recursively.
-Vitor
More information about the ffmpeg-devel
mailing list