[Ffmpeg-devel] BlackFin lowlevel pixel operations PATCH

Sun Apr 1 14:16:36 CEST 2007

On Sun, Apr 01, 2007 at 07:34:04AM -0400, Marc Hoffman wrote:
Content-Description: message body text
> Marc Hoffman writes:
>  > 
>  > Diego Biurrun writes:
>  >  > On Fri, Mar 30, 2007 at 07:32:14AM -0400, Marc Hoffman wrote:
>  >  > > 
>  >  > > --- bfin/dsputil_bfin.c	(revision 8517)
>  >  > > +++ bfin/dsputil_bfin.c	(working copy)
>  >  > > @@ -18,38 +21,296 @@
>  >  > >  
>  >  > > +static void bfin_idct_add (uint8_t *dest, int line_size, DCTELEM *block)
>  >  > > +{
>  >  > > +  ff_bfin_idct (block);
>  >  > > +  ff_bfin_add_pixels_clamped (block, dest, line_size);
>  >  > > +}
>  >  > 
>  >  > Here and everywhere else: FFmpeg coding style mandates 4 space
>  >  > indentation.
>  >  > 
>  > 
>  > No problem, is that for every indentation level or just the first level?
>  > 
> dsputils indented.

Please use four spaces everywhere.

Some spelling/grammar nitpicks below.

> --- bfin/fdct_bfin.S	(revision 0)
> +++ bfin/fdct_bfin.S	(revision 0)
> @@ -0,0 +1,383 @@
> +
> +This is the implementation of Chen's algorithm of DCT.  It is based on
> +the separable nature of DCT for multi- dimension. The input matrix is
> +8x8 real data. First, one dime- sional 8-point DCT is calculated for

stray -

> +transpose. Then again 8-point DCT is calculated on each row of

of the

> +matrix. The output is again stored in a transpose matrix. This is
> +final output.

the final output

> +addition and subtraction is done with one multiplication.  In the
> +third and last (fourth) stages more MAC operations involved.

get/are involved

> +The algorithm operates in-placed.

in-place

> +transposing the matrix and calculation of bit reversed are carried out

reversal

> +Output of function is provided "in" buffer in normal order.

in "in"

> +This function takes a two argument the first of which is the base

takes two arguments, the first

> +\item[ptr\_dct\_input] \par input signal, 16bit input samples.
> +\item[ptr\_dct\_input] \par output signal, 16bit output samples.
> +\item[ptr\_dct_\coefs] \par dct coefficient matrix.
> +\item[ptr\_dct\_temp]  \par temporary output buffer for middle stage.

No need to add periods here.

> +               FP - Should point to the begining of the jpeg encode structure

begiNNing, JPEG

> +        M0 = 12 (X);                    // All these initialization are used in the

initializationS

> +        // prescale the input to get the precision correct

correct precision

> +* a loop of 2 iteration (DCT_strt, DCT_end) is set.

iterationS

> +* teration. The input is read from "in" buffer and output is written to

iteration

> +        P1 = B2;                        // P1 pointts to temporary array

points

> +* Where notation (x, y) means the element from column x is in upper half of
> +* register and element from column y is in lower half of the register.

the upper/lower half

> +*       The following two instruction does the job of stage 3 -

instructionS

> +*       A1 = Element 0 * cos(pi/4)
> +*       A0 =  Element 0 * cos(pi/4)

Align this.  Also, is there a typo?  Are A0 and A1 really the same?

> +* calculation is done outside the loop, after wards it is done here. It

afterwards

> +* serves two purpose.

purposeS

> +*  Firts it computes part 1 for the next data, and it writes the data 5, and

First

> --- bfin/idct_bfin.S	(revision 0)
> +++ bfin/idct_bfin.S	(revision 0)
> @@ -0,0 +1,422 @@
> +Description     : This is the implementation of Chen's algorithm of IDCT.

Umm, I didn't look through this in detail, but this description appears
to be a (near) duplicate of the one in bfin/fdct_bfin.S.  That's not
good.

Diego