[FFmpeg-devel] [PATCH 2/3] Indeo 5 decoder: common DSP functions
Reimar Döffinger
Reimar.Doeffinger
Sun Jan 17 20:32:05 CET 2010
Do these functions have any alignment requirements?
Or can you promise a certain alignment on the inputs?
For anything that's speed-critical such documentation might
speed up future optimization attempts.
On Fri, Jan 15, 2010 at 12:05:47PM +0200, Kostya wrote:
> /**
> * two-dimensional inverse slant 8x8 transform
> *
> * @param in [in] pointer to the vector of transform coefficients
> * @param out [out] pointer to the output buffer (frame)
> * @param pitch [in] pitch to move to the next y line
> * @param flags [in] pointer to the array of column flags:
> * != 0 - non_empty column, 0 - empty one
> * (this array must be filled by caller)
> */
> void ff_ivi_inverse_slant_8x8(int32_t *in, int16_t *out, uint32_t pitch, const uint8_t *flags);
>
> /**
> * two-dimensional inverse slant 4x4 transform
> *
> * @param in [in] pointer to the vector of transform coefficients
> * @param out [out] pointer to the output buffer (frame)
> * @param pitch [in] pitch to move to the next y line
> * @param flags [in] pointer to the array of column flags:
> * != 0 - non_empty column, 0 - empty one
> * (this array must be filled by caller)
> */
> void ff_ivi_inverse_slant_4x4(int32_t *in, int16_t *out, uint32_t pitch, const uint8_t *flags);
Why are those "in" not const if they are really only input as the documentation says?
The same for many more functions here.
> /**
> * This is a speed-up version of the inverse 2D slant transforms
> * for the case if there is a non-zero DC coeff and all AC coeffs are zero.
Hm? I think "sped-up" might be grammatically correct, but how about
"A inverse 2D slant transform optimized for the all-zero AC coefficients case"
(is the non-zero DC relevant here?)?
> /**
> * Copies the pixels into the frame buffer.
> */
> void ff_ivi_put_pixels_8x8(int32_t *in, int16_t *out, uint32_t pitch, const uint8_t *flags);
>
> /**
> * Copies the DC coefficient into the first pixel of the block and
> * zeroes all others.
> */
> void ff_ivi_put_dc_pixel_8x8(const int32_t *in, int16_t *out, uint32_t pitch, int blk_size);
>
> /**
> * 8x8 block motion compensation with adding delta
If I remember right, there are dsputils functions that do something similar.
Describing how these differ might be helpful.
> /** butterfly operation for the inverse slant transform */
> #define IVI_SLANT_BFLY(x, y) t1 = x-y; x += y; y = t1;
>
> /** This is a reflection a,b = 1/2, 5/4 for the inverse slant transform */
> #define IVI_IREFLECT(s1, s2) {\
> t1 = s1 + ((s1 + s2*2 + 2) >> 2);\
> s2 = ((s1*2 - s2 + 2) >> 2) - s2;\
> s1 = t1;}
>
> /** This is a reflection a,b = 1/2, 7/8 for the inverse slant transform */
> #define IVI_SLANT_PART4(s1, s2) {\
> t1 = s2 + ((s1*4 - s2 + 4) >> 3);\
> s2 = ((-s1 - s2*4 + 4) >> 3) + s1;\
> s1 = t1;}
You're supposed to put the arguments in () for macros to avoid
really nasty to debug issues.
Also, why is t1 nor declared inside the macro (of course better with
a name that is less likely to cause name clashes).
Well, in the IVI_INV_SLANT8 macros probably would still be ok, too,
but having a large function with a variable that is only modified
hidden in some macros is not so great.
> memset(out, 0, 8*sizeof(int16_t));
sizeof(*out) maybe?
There's the same code another time later
> void ff_ivi_dc_slant_2d(const int32_t *in, int16_t *out, uint32_t pitch, int blk_size)
> {
> int x, y;
> int16_t dc_coeff;
>
> dc_coeff = (*in + 1) >> 1;
>
> for (y = 0; y < blk_size; out += pitch, y++) {
> for (x = 0; x < blk_size; x++)
> out[x] = dc_coeff;
> }
> }
Seems like we could really use some memset function that sets not just 1-byte values...
Assuming we have some appropriate constraints on out alignmen and blk_size this sure
could be optimized anyway.
> void ff_ivi_dc_row_slant(const int32_t *in, int16_t *out, uint32_t pitch, int blk_size)
Same applies here.
> int i, t1, row2, row4, row8;
>
> out[0] = out[pitch] = out[row2] = out[row2 + pitch] = out[row4] =
> out[row4 + pitch] = out[row4 + row2] = out[row8 - pitch] = 0;
Some really messed up alignment?
> void ff_ivi_put_pixels_8x8(int32_t *in, int16_t *out, uint32_t pitch,
> const uint8_t *flags)
> {
> int x, y;
>
> for (y = 0; y < 8; out += pitch, in += 8, y++)
> for (x = 0; x < 8; x++)
> out[x] = in[x];
Why not use memcpy or at least use uint32_t * casts? Or is the alignment not
sufficient?
> void ff_ivi_put_dc_pixel_8x8(const int32_t *in, int16_t *out, uint32_t pitch,
> int blk_size)
> {
> int y;
Why those strange spaces?
> out[0] = in[0];
> memset(&out[1], 0, 7*sizeof(int16_t));
out + 1 is the more common way to write it in FFmpeg I think.
> #define op_put(a, b) (a) = (b)
> #define op_add(a, b) (a) += (b)
I think macros should always be uppercase.
More information about the ffmpeg-devel
mailing list