[FFmpeg-devel] [PATCH 2/3] Indeo 5 decoder: common DSP functions

Sun Jan 17 20:32:05 CET 2010

Do these functions have any alignment requirements?
Or can you promise a certain alignment on the inputs?
For anything that's speed-critical such documentation might
speed up future optimization attempts.

On Fri, Jan 15, 2010 at 12:05:47PM +0200, Kostya wrote:
> /**
>  *  two-dimensional inverse slant 8x8 transform
>  *
>  *  @param  in      [in]  pointer to the vector of transform coefficients
>  *  @param  out     [out] pointer to the output buffer (frame)
>  *  @param  pitch   [in]  pitch to move to the next y line
>  *  @param  flags   [in]  pointer to the array of column flags:
>  *                        != 0 - non_empty column, 0 - empty one
>  *                        (this array must be filled by caller)
>  */
> void ff_ivi_inverse_slant_8x8(int32_t *in, int16_t *out, uint32_t pitch, const uint8_t *flags);
> 
> /**
>  *  two-dimensional inverse slant 4x4 transform
>  *
>  *  @param  in      [in]  pointer to the vector of transform coefficients
>  *  @param  out     [out] pointer to the output buffer (frame)
>  *  @param  pitch   [in]  pitch to move to the next y line
>  *  @param  flags   [in]  pointer to the array of column flags:
>  *                        != 0 - non_empty column, 0 - empty one
>  *                        (this array must be filled by caller)
>  */
> void ff_ivi_inverse_slant_4x4(int32_t *in, int16_t *out, uint32_t pitch, const uint8_t *flags);

Why are those "in" not const if they are really only input as the documentation says?
The same for many more functions here.

> /**
>  *  This is a speed-up version of the inverse 2D slant transforms
>  *  for the case if there is a non-zero DC coeff and all AC coeffs are zero.

Hm? I think "sped-up" might be grammatically correct, but how about
"A inverse 2D slant transform optimized for the all-zero AC coefficients case"
(is the non-zero DC relevant here?)?

> /**
>  *  Copies the pixels into the frame buffer.
>  */
> void ff_ivi_put_pixels_8x8(int32_t *in, int16_t *out, uint32_t pitch, const uint8_t *flags);
> 
> /**
>  *  Copies the DC coefficient into the first pixel of the block and
>  *  zeroes all others.
>  */
> void ff_ivi_put_dc_pixel_8x8(const int32_t *in, int16_t *out, uint32_t pitch, int blk_size);
> 
> /**
>  *  8x8 block motion compensation with adding delta

If I remember right, there are dsputils functions that do something similar.
Describing how these differ might be helpful.

> /** butterfly operation for the inverse slant transform */
> #define IVI_SLANT_BFLY(x, y) t1 = x-y; x += y; y = t1;
> 
> /** This is a reflection a,b = 1/2, 5/4 for the inverse slant transform */
> #define IVI_IREFLECT(s1, s2) {\
>     t1 = s1 + ((s1 + s2*2 + 2) >> 2);\
>     s2 = ((s1*2 - s2 + 2) >> 2) - s2;\
>     s1 = t1;}
> 
> /** This is a reflection a,b = 1/2, 7/8 for the inverse slant transform */
> #define IVI_SLANT_PART4(s1, s2) {\
>     t1 = s2 + ((s1*4 - s2 + 4)  >> 3);\
>     s2 = ((-s1 - s2*4 + 4) >> 3) + s1;\
>     s1 = t1;}

You're supposed to put the arguments in () for macros to avoid
really nasty to debug issues.
Also, why is t1 nor declared inside the macro (of course better with
a name that is less likely to cause name clashes).
Well, in the IVI_INV_SLANT8 macros probably would still be ok, too,
but having a large function with a variable that is only modified
hidden in some macros is not so great.

>             memset(out, 0, 8*sizeof(int16_t));

sizeof(*out) maybe?
There's the same code another time later

> void ff_ivi_dc_slant_2d(const int32_t *in, int16_t *out, uint32_t pitch, int blk_size)
> {
>     int     x, y;
>     int16_t dc_coeff;
> 
>     dc_coeff = (*in + 1) >> 1;
> 
>     for (y = 0; y < blk_size; out += pitch, y++) {
>         for (x = 0; x < blk_size; x++)
>             out[x] = dc_coeff;
>     }
> }

Seems like we could really use some memset function that sets not just 1-byte values...
Assuming we have some appropriate constraints on out alignmen and blk_size this sure
could be optimized anyway.

> void ff_ivi_dc_row_slant(const int32_t *in, int16_t *out, uint32_t pitch, int blk_size)

Same applies here.

>     int     i, t1, row2, row4, row8;
> 
>             out[0] = out[pitch] = out[row2] = out[row2 + pitch] = out[row4] =
>                     out[row4 + pitch] =  out[row4 + row2] = out[row8 - pitch] = 0;

Some really messed up alignment?

> void ff_ivi_put_pixels_8x8(int32_t *in, int16_t *out, uint32_t pitch,
>                            const uint8_t *flags)
> {
>     int     x, y;
> 
>     for (y = 0; y < 8; out += pitch, in += 8, y++)
>         for (x = 0; x < 8; x++)
>             out[x] = in[x];

Why not use memcpy or at least use uint32_t * casts? Or is the alignment not
sufficient?

> void ff_ivi_put_dc_pixel_8x8(const int32_t *in, int16_t *out, uint32_t pitch,
>                              int blk_size)
> {
>     int     y;

Why those strange spaces?

>     out[0] = in[0];
>     memset(&out[1], 0, 7*sizeof(int16_t));

out + 1 is the more common way to write it in FFmpeg I think.

> #define op_put(a, b)  (a) = (b)
> #define op_add(a, b)  (a) += (b)

I think macros should always be uppercase.