[Ffmpeg-devel] H.264 encoder

Fri Jun 23 19:00:36 CEST 2006

Hi

On Fri, Jun 23, 2006 at 03:58:24PM +0200, Panagiotis Issaris wrote:
> Hi,
> 
> We've just put a new patch on our homepage which addresses
> the remarks made.
> 
> Direct link to the patch:
> http://research.edm.uhasselt.be/~h264/edm-20060622T170534-ffmpeg-h264.diff
> 
> Michael Niedermayer wrote:
> 
> >>>> ...
> >>>> We felt that now was a good time to ask for some additional feedback.
> >>    
> >>
> >> it contains a few pretty large tables which could maybe be generated at
> >> runtime ...
> >  
> 
> Fixed. Except for a table which contains statistically
> gathered data on the relation between the SAE and the resulting
> encoded size for a certain block. 

replace by an approximation (someting which fits in <80 chars of source), 
your values seem like they are overtrained to some data, thats not usefull
for any other data

[...]

> >> code duplication, ive not checked too carefully but loop filter & motion
> >> compensation at least appear to be duplicated from the h264 decoder
> >  
> 
> We didn't find a way to easily reuse the loop filter from the decoder, which
> is why we wrote the routine ourselves. The motion compensation code in

hmm, where is the problem?

> libavcodec depends on the MpegEncoderContext, which we're not using. For 
> this
> reason, we wrote a simple motion compensation scheme ourselves, using the
> optimized DspContext functions.

hmm, i dont remember any h264 MC functions which depend on MpegEncContext

[...]
> I'm pretty sure our codec is simpler then x264, if only for the simple
> reason
> that it is _much_ smaller in codesize and number of features as we only
> started working on it a few months ago  :) 

well, is that true even when its compared against old x264 versions?

> 
> What specifically needs to be done now to get the patch integrated in
> libavcodec?

alot, the code looks somewhat messy, more details are below,
also note that if there where many similar issues i generally
mentioned just one but for acceptance all need to be fixed

[...]
> diff -Naur ff/libavcodec/dsputil.c ff-ours/libavcodec/dsputil.c
> --- ff/libavcodec/dsputil.c	2006-05-30 07:44:22.000000000 +0200
> +++ ff-ours/libavcodec/dsputil.c	2006-06-21 16:55:41.000000000 +0200
> @@ -3769,6 +3769,243 @@
>      dest[0] = cm[dest[0] + ((block[0] + 4)>>3)];
>  }
>  
> +const int16_t MF00[6] = {13107, 11916, 10082, 9362, 8192, 7282};
> +const int16_t V00[6] = {10*16, 11*16, 13*16, 14*16, 16*16, 18*16};

these exist several times
and for non static variables their names are not that good

also maybe you could add a h264dsp.c for h264 specific code (not generic 4x4
block compare or so, just the really h264 specific stuff) dsputil.c is
already quite huge ...

> +
> +// we'll always work with transposed input blocks, to avoid having to make a distinction between
> +// C and mmx implementations
> +void ff_h264_transform_dct_quant_c(int16_t block[4][4], int QP, int dontscaleDC) // y,x indexing
> +{
> +    static const int16_t MF[6][4][4] = 
> +    {
> +        { { 13107, 8066, 13107, 8066}, {  8066, 5243,  8066, 5243}, { 13107, 8066, 13107, 8066}, {  8066, 5243,  8066, 5243} },
> +        { { 11916, 7490, 11916, 7490}, {  7490, 4660,  7490, 4660}, { 11916, 7490, 11916, 7490}, {  7490, 4660,  7490, 4660} },
> +        { { 10082, 6554, 10082, 6554}, {  6554, 4194,  6554, 4194}, { 10082, 6554, 10082, 6554}, {  6554, 4194,  6554, 4194} },
> +        { {  9362, 5825,  9362, 5825}, {  5825, 3647,  5825, 3647}, {  9362, 5825,  9362, 5825}, {  5825, 3647,  5825, 3647} },
> +        { {  8192, 5243,  8192, 5243}, {  5243, 3355,  5243, 3355}, {  8192, 5243,  8192, 5243}, {  5243, 3355,  5243, 3355} },
> +        { {  7282, 4559,  7282, 4559}, {  4559, 2893,  4559, 2893}, {  7282, 4559,  7282, 4559}, {  4559, 2893,  4559, 2893} }
> +    };
> +    int32_t qbits = 15 + QP/6;
> +    int32_t f = (1<<qbits)/3;
> +    int mod = QP%6;

maybe the unused quantize_c() from h264.c could be used or merged? 

[...]

> +void ff_h264_transform_inverse_quant_dct_add_c(int16_t block[4][4], int QP, int dontscaleDC, uint8_t *dst, int stride) // y,x indexing
> +{
> +    static const int16_t V[6][4][4] = 
> +    {
> +        { { 10*16, 13*16, 10*16, 13*16}, { 13*16, 16*16, 13*16, 16*16}, { 10*16, 13*16, 10*16, 13*16}, { 13*16, 16*16, 13*16, 16*16} },
> +        { { 11*16, 14*16, 11*16, 14*16}, { 14*16, 18*16, 14*16, 18*16}, { 11*16, 14*16, 11*16, 14*16}, { 14*16, 18*16, 14*16, 18*16} },
> +        { { 13*16, 16*16, 13*16, 16*16}, { 16*16, 20*16, 16*16, 20*16}, { 13*16, 16*16, 13*16, 16*16}, { 16*16, 20*16, 16*16, 20*16} },
> +        { { 14*16, 18*16, 14*16, 18*16}, { 18*16, 23*16, 18*16, 23*16}, { 14*16, 18*16, 14*16, 18*16}, { 18*16, 23*16, 18*16, 23*16} },
> +        { { 16*16, 20*16, 16*16, 20*16}, { 20*16, 25*16, 20*16, 25*16}, { 16*16, 20*16, 16*16, 20*16}, { 20*16, 25*16, 20*16, 25*16} },
> +        { { 18*16, 23*16, 18*16, 23*16}, { 23*16, 29*16, 23*16, 29*16}, { 18*16, 23*16, 18*16, 23*16}, { 23*16, 29*16, 23*16, 29*16} }
> +    };
> +    DCTELEM elem[4][4];
> +    int mod = QP%6;
> +
> +    if (QP >= 24)
> +    {
> +        int shift = QP/6-4;
> +        
> +        if (dontscaleDC)
> +            elem[0][0] = block[0][0];
> +        else
> +            elem[0][0] = ((int32_t)block[0][0]*V[mod][0][0]) << shift;
> +        
> +        elem[0][1] = ((int32_t)block[0][1]*V[mod][0][1]) << shift;
> +        elem[0][2] = ((int32_t)block[0][2]*V[mod][0][2]) << shift;
> +        elem[0][3] = ((int32_t)block[0][3]*V[mod][0][3]) << shift;
> +        
> +        elem[1][0] = ((int32_t)block[1][0]*V[mod][1][0]) << shift;
> +        elem[1][1] = ((int32_t)block[1][1]*V[mod][1][1]) << shift;
> +        elem[1][2] = ((int32_t)block[1][2]*V[mod][1][2]) << shift;
> +        elem[1][3] = ((int32_t)block[1][3]*V[mod][1][3]) << shift;
> +    
> +        elem[2][0] = ((int32_t)block[2][0]*V[mod][2][0]) << shift;
> +        elem[2][1] = ((int32_t)block[2][1]*V[mod][2][1]) << shift;
> +        elem[2][2] = ((int32_t)block[2][2]*V[mod][2][2]) << shift;
> +        elem[2][3] = ((int32_t)block[2][3]*V[mod][2][3]) << shift;
> +
> +        elem[3][0] = ((int32_t)block[3][0]*V[mod][3][0]) << shift;
> +        elem[3][1] = ((int32_t)block[3][1]*V[mod][3][1]) << shift;
> +        elem[3][2] = ((int32_t)block[3][2]*V[mod][3][2]) << shift;
> +        elem[3][3] = ((int32_t)block[3][3]*V[mod][3][3]) << shift;

maybe the init_dequant* stuff from h264.c could be used?

[...]

> +/**
> + * |ZD(i,j)| = (|YD(i,j)| MF(0,0) + 2 f) >> (qbits + 1)
> + *
> + */
> +void ff_h264_hadamard_quant_4x4_c(DCTELEM Y[4][4], int QP)
> +{
> +    int qbits = 15 + QP/6;
> +    int f2 = ((1 << qbits) / 3)*2;
> +    int shift = (qbits + 1);
> +    int mod = QP%6;

% and / are slow, they could be done with tables assuming the whole isnt
replaced by a table

> +
> +    int32_t MF = MF00[mod];
> +    
> +    Y[0][0] = (((ABS(Y[0][0])>>1) * MF + f2) >> shift)*(1-(((Y[0][0])>>14)&2));

shift could be made a constant and added into MF
to change the sign based on the sign of another number you can use:
(a ^ (b>>31)) - (b>>31)

> +    Y[0][1] = (((ABS(Y[0][1])>>1) * MF + f2) >> shift)*(1-(((Y[0][1])>>14)&2));
> +    Y[0][2] = (((ABS(Y[0][2])>>1) * MF + f2) >> shift)*(1-(((Y[0][2])>>14)&2));
> +    Y[0][3] = (((ABS(Y[0][3])>>1) * MF + f2) >> shift)*(1-(((Y[0][3])>>14)&2));
> +    
> +    Y[1][0] = (((ABS(Y[1][0])>>1) * MF + f2) >> shift)*(1-(((Y[1][0])>>14)&2));
> +    Y[1][1] = (((ABS(Y[1][1])>>1) * MF + f2) >> shift)*(1-(((Y[1][1])>>14)&2));
> +    Y[1][2] = (((ABS(Y[1][2])>>1) * MF + f2) >> shift)*(1-(((Y[1][2])>>14)&2));
> +    Y[1][3] = (((ABS(Y[1][3])>>1) * MF + f2) >> shift)*(1-(((Y[1][3])>>14)&2));

this stuff should eithetr be a loop or a macro or whatever, but not 16 times 
near duplicated

[...]

> +//#define DEBUG_H264CAVLC
> +#define H264_CAVLC_CLIPTABLE_SIZE        65536
> +#define H264_CAVLC_CLIPTABLE_OFFSET    32768
> +
> +static int length_table[7][4095];
> +static int code_table[7][4095];
> +int16_t ff_h264_cavlc_cliptable[H264_CAVLC_CLIPTABLE_SIZE];
> +
> +void h264cavlc_generate_cliptable()
> +{
> +	int value;
> +	for(value=-32768; value<32768; value++)
> +	{
> +		if (value < -2047)
> +			ff_h264_cavlc_cliptable[value + H264_CAVLC_CLIPTABLE_OFFSET] = -2047;
> +		else if (value > 2047)
> +			ff_h264_cavlc_cliptable[value + H264_CAVLC_CLIPTABLE_OFFSET] = 2047;
> +		else
> +			ff_h264_cavlc_cliptable[value + H264_CAVLC_CLIPTABLE_OFFSET] = value;
> +	}

is it really faster to use that table instead of if(x+2047 > 4094U) x= ...
and cliping is bad, VERY bad, you should change the mb type, qp or whatever
if you cannot encode a coefficient, of coarse if you encode several MB types
and choose the one which has the best rate+distortion at the end then cliping
isnt an issue

> +}
> +
> +void h264cavlc_generate_tables()
> +{
> +	int vlcnum, level;
> +	for (vlcnum=0; vlcnum<7; vlcnum++)
> +	{
> +		for(level=-2047; level<2048; level++)
> +		{
> +			if (vlcnum == 0)
> +			{
> +				int levabs, sign; 
> +				int len, inf;
> +
> +				if (level < 0)
> +				{
> +					sign = 1;
> +					levabs = -level;
> +				}
> +				else
> +				{
> +					sign = 0;
> +					levabs = level;
> +				}

int sign= level < 0;
int levabs= ABS(level);

[...]

> +static inline int h264cavlc_get_lookup_table(int na, int nb)
> +{
> +	int nc = 0;
> +	int8_t lookup_table[8] = {0, 0, 1, 1, 2, 2, 2, 2};
> +
> +	if (na >= 0 && nb >= 0)
> +	{
> +		nc = na+nb+1;
> +		nc >>= 1;
> +	}
> +	else 
> +	{
> +		if (na >= 0) // nB < 0
> +			nc = na;
> +		else if (nb >= 0) // nA < 0
> +			nc = nb;
> +	}
> +
> +	return (nc < 8) ? lookup_table[nc] : 3;
> +}

pred_non_zero_count() could be used for the first part

[...]

> +  
> +	// Encode the trailing one sign bits
> +	
> +	for (i = total_coeffs-1, t = trailing_ones ; t > 0 ; i--, t--)
> +	{
> +		if (levels[i] > 0) // +1
> +			put_bits(b,1,0);
> +		else // -1
> +			put_bits(b,1,1);

put_bits(b,1, levels[i] <= 0);

> +	}
> +	
> +	// Encode levels of the remaining nonzero coefficients
> +
> +	if (numlevels > 0)
> +	{
> +		int level_two_or_higher = 1;
> +		int firstlevel = 1;
> +		int vlcnum;
> +		
> +		if (total_coeffs > 3 && trailing_ones == 3)
> +			level_two_or_higher = 0;
> +	
> +		if (total_coeffs > 10 && trailing_ones < 3)
> +			vlcnum = 1;
> +		else
> +			vlcnum = 0;

vlcnum = total_coeffs > 10 && trailing_ones < 3;

> +	
> +		for (i = numlevels-1 ; i >= 0 ; i--)
> +		{
> +			int16_t val = levels[i];
> +			int16_t level = val; // 'level' will contain the absolute value of 'val'
> +	
> +			if (level < 0)
> +				level = -level;

int16_t level= ABS(val);

[...]

> +	for (i = total_coeffs-1 ; i > 0 && total_zeros > 0 ; i--)
> +	{
> +		int runbefore = zeros[i];
> +		int vlcnum = total_zeros-1;
> +		if (vlcnum > 6)
> +			vlcnum = 6;

int vlcnum= FFMIN(total_zeros-1, 6);

> +/* NAL unit types */
> +#define NAL_SLICE                1
> +#define NAL_DPA                  2
> +#define NAL_DPB                  3
> +#define NAL_DPC                  4
> +#define NAL_IDR_SLICE            5
> +#define NAL_SEI                  6
> +#define NAL_SPS                  7
> +#define NAL_PPS                  8
> +#define NAL_AUD                  9
> +#define NAL_END_SEQUENCE        10
> +#define NAL_END_STREAM          11
> +#define NAL_FILLER_DATA         12
> +#define NAL_SPS_EXT             13
> +#define NAL_AUXILIARY_SLICE     19
> +

maybe an enum would be cleaner, maybe not dunno

[...]

> +#define H264_CAVLC_CLIPTABLE_SIZE        65536
> +#define H264_CAVLC_CLIPTABLE_OFFSET    32768

duplicated

[...]

> +#define H264_COPY_4X4BLOCK_TRANSPOSED(xoffset,yoffset,dest,src1,src2) \
> +{ \
> +	dest[0][0] = src1[yoffset+0][xoffset+0]-src2[yoffset+0][xoffset+0]; \
> +	dest[1][0] = src1[yoffset+0][xoffset+1]-src2[yoffset+0][xoffset+1]; \
> +	dest[2][0] = src1[yoffset+0][xoffset+2]-src2[yoffset+0][xoffset+2]; \
> +	dest[3][0] = src1[yoffset+0][xoffset+3]-src2[yoffset+0][xoffset+3]; \
> +	dest[0][1] = src1[yoffset+1][xoffset+0]-src2[yoffset+1][xoffset+0]; \
> +	dest[1][1] = src1[yoffset+1][xoffset+1]-src2[yoffset+1][xoffset+1]; \
> +	dest[2][1] = src1[yoffset+1][xoffset+2]-src2[yoffset+1][xoffset+2]; \
> +	dest[3][1] = src1[yoffset+1][xoffset+3]-src2[yoffset+1][xoffset+3]; \
> +	dest[0][2] = src1[yoffset+2][xoffset+0]-src2[yoffset+2][xoffset+0]; \
> +	dest[1][2] = src1[yoffset+2][xoffset+1]-src2[yoffset+2][xoffset+1]; \
> +	dest[2][2] = src1[yoffset+2][xoffset+2]-src2[yoffset+2][xoffset+2]; \
> +	dest[3][2] = src1[yoffset+2][xoffset+3]-src2[yoffset+2][xoffset+3]; \
> +	dest[0][3] = src1[yoffset+3][xoffset+0]-src2[yoffset+3][xoffset+0]; \
> +	dest[1][3] = src1[yoffset+3][xoffset+1]-src2[yoffset+3][xoffset+1]; \
> +	dest[2][3] = src1[yoffset+3][xoffset+2]-src2[yoffset+3][xoffset+2]; \
> +	dest[3][3] = src1[yoffset+3][xoffset+3]-src2[yoffset+3][xoffset+3]; \

use loop or macros please to simplify that (5 lines for a line copy and
4 line copys is better then 16 element copies)

[...]

> +#ifdef H264_DEBUG_WRITE_DECODED_IMAGE
> +static void ff_h264_append_image(uint8_t *data, AVCodecContext *avctx)
> +{
> +	int f = open("/tmp/teststream.yuv",O_CREAT|O_WRONLY|O_APPEND,S_IRUSR|S_IWUSR);
> +	write(f,data,avctx->width*avctx->height*3/2);
> +	close(f);

write() and close() is not ok in a codec

[...]

> +	for (i = 0 ; i < rbsplen ; i++)
> +	{
> +		if (i + 2 < rbsplen && (rbsp[i] == 0 && rbsp[i+1] == 0 && rbsp[i+2] < 4))
> +		{
> +			dest[destpos++] = rbsp[i++];
> +			dest[destpos++] = rbsp[i];
> +			dest[destpos++] = 0x03; // emulation prevention byte
> +		}
> +		else
> +			dest[destpos++] = rbsp[i];
> +	}

instead of a loop which checks every byte, you could check just every 2nd
also see encode_nal() un h264.c maybe you can use some parts from that

[...]
> +
> +// inblock is transposed, outblock isn't
> +void ff_h264_dct_c(DCTELEM inblock[4][4],DCTELEM outblock[4][4])
> +{
> +	DCTELEM pieces[4][4];
> +	
> +	pieces[0][0] = inblock[0][0]+inblock[1][0]+inblock[2][0]+inblock[3][0];
> +	pieces[0][1] = inblock[0][1]+inblock[1][1]+inblock[2][1]+inblock[3][1];
> +	pieces[0][2] = inblock[0][2]+inblock[1][2]+inblock[2][2]+inblock[3][2];
> +	pieces[0][3] = inblock[0][3]+inblock[1][3]+inblock[2][3]+inblock[3][3];
> +
> +	pieces[1][0] = (inblock[0][0]<<1)+inblock[1][0]-inblock[2][0]-(inblock[3][0]<<1);
> +	pieces[1][1] = (inblock[0][1]<<1)+inblock[1][1]-inblock[2][1]-(inblock[3][1]<<1);
> +	pieces[1][2] = (inblock[0][2]<<1)+inblock[1][2]-inblock[2][2]-(inblock[3][2]<<1);
> +	pieces[1][3] = (inblock[0][3]<<1)+inblock[1][3]-inblock[2][3]-(inblock[3][3]<<1);
> +	
> +	pieces[2][0] = inblock[0][0]-inblock[1][0]-inblock[2][0]+inblock[3][0];
> +	pieces[2][1] = inblock[0][1]-inblock[1][1]-inblock[2][1]+inblock[3][1];
> +	pieces[2][2] = inblock[0][2]-inblock[1][2]-inblock[2][2]+inblock[3][2];
> +	pieces[2][3] = inblock[0][3]-inblock[1][3]-inblock[2][3]+inblock[3][3];
> +
> +	pieces[3][0] = inblock[0][0]-(inblock[1][0]<<1)+(inblock[2][0]<<1)-inblock[3][0];
> +	pieces[3][1] = inblock[0][1]-(inblock[1][1]<<1)+(inblock[2][1]<<1)-inblock[3][1];
> +	pieces[3][2] = inblock[0][2]-(inblock[1][2]<<1)+(inblock[2][2]<<1)-inblock[3][2];
> +	pieces[3][3] = inblock[0][3]-(inblock[1][3]<<1)+(inblock[2][3]<<1)-inblock[3][3];
> +	
> +	outblock[0][0] = pieces[0][0]+pieces[0][1]+pieces[0][2]+pieces[0][3];
> +	outblock[0][1] = pieces[1][0]+pieces[1][1]+pieces[1][2]+pieces[1][3];
> +	outblock[0][2] = pieces[2][0]+pieces[2][1]+pieces[2][2]+pieces[2][3];
> +	outblock[0][3] = pieces[3][0]+pieces[3][1]+pieces[3][2]+pieces[3][3];
> +
> +	outblock[1][0] = (pieces[0][0] << 1)+pieces[0][1]-pieces[0][2]-(pieces[0][3]<<1);
> +	outblock[1][1] = (pieces[1][0] << 1)+pieces[1][1]-pieces[1][2]-(pieces[1][3]<<1);
> +	outblock[1][2] = (pieces[2][0] << 1)+pieces[2][1]-pieces[2][2]-(pieces[2][3]<<1);
> +	outblock[1][3] = (pieces[3][0] << 1)+pieces[3][1]-pieces[3][2]-(pieces[3][3]<<1);
> +
> +	outblock[2][0] = pieces[0][0]-pieces[0][1]-pieces[0][2]+pieces[0][3];
> +	outblock[2][1] = pieces[1][0]-pieces[1][1]-pieces[1][2]+pieces[1][3];
> +	outblock[2][2] = pieces[2][0]-pieces[2][1]-pieces[2][2]+pieces[2][3];
> +	outblock[2][3] = pieces[3][0]-pieces[3][1]-pieces[3][2]+pieces[3][3];
> +
> +	outblock[3][0] = pieces[0][0]-(pieces[0][1]<<1)+(pieces[0][2]<<1)-pieces[0][3];
> +	outblock[3][1] = pieces[1][0]-(pieces[1][1]<<1)+(pieces[1][2]<<1)-pieces[1][3];
> +	outblock[3][2] = pieces[2][0]-(pieces[2][1]<<1)+(pieces[2][2]<<1)-pieces[2][3];
> +	outblock[3][3] = pieces[3][0]-(pieces[3][1]<<1)+(pieces[3][2]<<1)-pieces[3][3];
> +}	

h264.c h264_diff_dct_c() looks so much nicer, is this faster?

[...]

> +
> +static inline void ff_h264_hadamard_invquant_2x2(int16_t Y[2][2], int QP)
> +{
> +	int32_t V = V00[QP%6];
> +	int div = QP/6;
> +
> +	Y[0][0] = ((Y[0][0]*V) << div) >> 5;
> +	Y[0][1] = ((Y[0][1]*V) << div) >> 5;
> +	Y[1][0] = ((Y[1][0]*V) << div) >> 5;
> +	Y[1][1] = ((Y[1][1]*V) << div) >> 5;

V <<= div; and no <<div obviously

[...]

> +static inline void ff_h264_neighbour_count_nonzero(MacroBlock *mb, int type, int x, int y, int *nA, int *nB)
> +{
> +	if (type == NEIGHBOUR_SUBTYPE_Y) // Y
> +	{
> +		if (x == 0)
> +		{
> +			MacroBlock *leftmb = mb->leftblock;
> +
> +			if (!leftmb)
> +				*nA = -1;
> +			else
> +				*nA = leftmb->Y_nonzero[y][3];
> +		}
> +		else
> +			*nA = mb->Y_nonzero[y][x-1];
> +
> +		if (y == 0)
> +		{
> +			MacroBlock *topmb = mb->topblock;
> +
> +			if (!topmb)
> +				*nB = -1;
> +			else
> +				*nB = topmb->Y_nonzero[3][x];
> +		}
> +		else
> +			*nB = mb->Y_nonzero[y-1][x];
> +	}
> +	else if (type == NEIGHBOUR_SUBTYPE_U) // U
> +	{
> +		if (x == 0)
> +		{
> +			MacroBlock *leftmb = mb->leftblock;
> +
> +			if (!leftmb)
> +				*nA = -1;
> +			else
> +				*nA = leftmb->U_nonzero[y][1];
> +		}
> +		else
> +			*nA = mb->U_nonzero[y][x-1];
> +
> +		if (y == 0)
> +		{
> +			MacroBlock *topmb = mb->topblock;
> +
> +			if (!topmb)
> +				*nB = -1;
> +			else
> +				*nB = topmb->U_nonzero[1][x];
> +		}
> +		else
> +			*nB = mb->U_nonzero[y-1][x];
> +	}
> +	else // V
> +	{
> +		if (x == 0)
> +		{
> +			MacroBlock *leftmb = mb->leftblock;
> +
> +			if (!leftmb)
> +				*nA = -1;
> +			else
> +				*nA = leftmb->V_nonzero[y][1];
> +		}
> +		else
> +			*nA = mb->V_nonzero[y][x-1];
> +
> +		if (y == 0)
> +		{
> +			MacroBlock *topmb = mb->topblock;
> +
> +			if (!topmb)
> +				*nB = -1;
> +			else
> +				*nB = topmb->V_nonzero[1][x];
> +		}
> +		else
> +			*nB = mb->V_nonzero[y-1][x];
> +	}
> +}

ugh, thats not duplicate thats triplicated ;)

[...]

> +	if (chromaDCcount == 0)
> +	{
> +		if (chromaACcount == 0)
> +			CodedBlockPatternChroma = 0;
> +		else
> +			CodedBlockPatternChroma = 2;
> +	}
> +	else
> +	{
> +		if (chromaACcount == 0)
> +			CodedBlockPatternChroma = 1;
> +		else
> +			CodedBlockPatternChroma = 2;
> +	}

if(chromaACcount) CodedBlockPatternChroma= 2;
else              CodedBlockPatternChroma= !!chromaDCcount;

[...]
> +	if (CodedBlockPatternLuma > 0)
> +	{
> +		for (j = 0 ; j < 4 ; j++)
> +		{
> +			int X = (j%2)*2;
> +			int Y = (j/2)*2;

% and / in more or less inner loops are not ok, especially when this is
just (j&1)*2 and j&2

[...]
> +		int x,y;
> +
> +		for (y = 0 ; y < 4 ; y++)
> +			for (x = 0 ; x < 4 ; x++)
> +				mb->Y_nonzero[y][x] = 0;

memset() ?

> +	}
> +	
> +	if (CodedBlockPatternChroma == 0)
> +	{
> +		int x,y;
> +
> +		for (y = 0 ; y < 2 ; y++)
> +		{
> +			for (x = 0 ; x < 2 ; x++)
> +			{
> +				mb->U_nonzero[y][x] = 0;
> +				mb->V_nonzero[y][x] = 0;

again memset() should be useable here, also look at fill_rectangle() from
h264.c

> +			}
> +		}
> +		return;
> +	}
> +	
> +	if (CodedBlockPatternChroma != 0)
> +	{
> +		coefficients[0] = UD[0][0];
> +		coefficients[1] = UD[0][1];
> +		coefficients[2] = UD[1][0];
> +		coefficients[3] = UD[1][1];
> +		h264cavlc_encode(b,coefficients,4,-1,-1,1); // nA and nB are not used in this case
> +		
> +		coefficients[0] = VD[0][0];
> +		coefficients[1] = VD[0][1];
> +		coefficients[2] = VD[1][0];
> +		coefficients[3] = VD[1][1];
> +		h264cavlc_encode(b,coefficients,4,-1,-1,1); // nA and nB are not used in this case

dulicate -> loop or macro

[...]
> +	for (y = 0 ; !done && y < 4 ; y++) for (x = 0 ; !done && x < 4 ; x++) if (residual->part4x4Y[0][0][y][x] != 0) done = 1;
> +	for (y = 0 ; !done && y < 4 ; y++) for (x = 0 ; !done && x < 4 ; x++) if (residual->part4x4Y[0][1][y][x] != 0) done = 1;
> +	for (y = 0 ; !done && y < 4 ; y++) for (x = 0 ; !done && x < 4 ; x++) if (residual->part4x4Y[1][0][y][x] != 0) done = 1;
> +	for (y = 0 ; !done && y < 4 ; y++) for (x = 0 ; !done && x < 4 ; x++) if (residual->part4x4Y[1][1][y][x] != 0) done = 1;

find a new keyboard your enter key is broken

[...]

> +
> +#ifdef H264_ENABLE_QPEL
> +
> +#define H264_QPEL_6TAP(a,b,c,d,e,f) ((a)-5*(b)+20*(c)+20*(d)-5*(e)+(f))
> +#define H264_QPEL_AVG(a,b) (((a)+(b)+1)>>1)
> +
> +static inline void ff_h264_calc_Yqpelpixels(const uint8_t *src, int srcstride, uint8_t *dst, int dststride1,int dststride2)
> +{
> +	int b1,h1,s1,m1;
> +	int aa,bb,gg,hh;
> +	int j1,j,b,h;
> +	const uint8_t *pos = src - 2 - 2*srcstride;
> +	const uint8_t *pos2 = pos + 2;
> +	int s,m;
> +	int a,c,d,n,f,i,k,q;
> +	int e,g,p,r;
> +	
> +	aa = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> +	pos += srcstride;
> +	bb = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> +	pos += srcstride;
> +	b1 = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> +	pos += srcstride;
> +	s1 = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> +	pos += srcstride;
> +	gg = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> +	pos += srcstride;
> +	hh = H264_QPEL_6TAP((int)pos[0],(int)pos[1],(int)pos[2],(int)pos[3],(int)pos[4],(int)pos[5]);
> +	h1 = H264_QPEL_6TAP((int)pos2[0],(int)pos2[srcstride],(int)pos2[srcstride*2],(int)pos2[srcstride*3],
> +			       (int)pos2[srcstride*4],(int)pos2[srcstride*5]);
> +	pos2++;
> +	m1 = H264_QPEL_6TAP((int)pos2[0],(int)pos2[srcstride],(int)pos2[srcstride*2],(int)pos2[srcstride*3],
> +			       (int)pos2[srcstride*4],(int)pos2[srcstride*5]);
> +	
> +	j1 = H264_QPEL_6TAP(aa,bb,b1,s1,gg,hh);
> +	b = clip_uint8((b1+16)>>5);
> +	h = clip_uint8((h1+16)>>5);
> +	j = clip_uint8((j1+512)>>10);
> +	s = clip_uint8((s1+16)>>5);
> +	m = clip_uint8((m1+16)>>5);
> +	
> +	a = H264_QPEL_AVG((int)src[0],b);
> +	c = H264_QPEL_AVG((int)src[1],b);
> +	d = H264_QPEL_AVG((int)src[0],h);
> +	n = H264_QPEL_AVG((int)src[srcstride],h);
> +	f = H264_QPEL_AVG(b,j);
> +	i = H264_QPEL_AVG(h,j);
> +	k = H264_QPEL_AVG(j,m);
> +	q = H264_QPEL_AVG(j,s);
> +
> +	e = H264_QPEL_AVG(b,h);
> +	g = H264_QPEL_AVG(b,m);
> +	p = H264_QPEL_AVG(h,s);
> +	r = H264_QPEL_AVG(m,s);
> +
> +	dst[0] = src[0];
> +	dst[dststride1] = a;
> +	dst[dststride1*2] = b;
> +	dst[dststride1*3] = c;
> +	dst += dststride2;
> +	dst[0] = d;
> +	dst[dststride1] = e;
> +	dst[dststride1*2] = f;
> +	dst[dststride1*3] = g;
> +	dst += dststride2;
> +	dst[0] = h;
> +	dst[dststride1] = i;
> +	dst[dststride1*2] = j;
> +	dst[dststride1*3] = k;
> +	dst += dststride2;
> +	dst[0] = n;
> +	dst[dststride1] = p;
> +	dst[dststride1*2] = q;
> +	dst[dststride1*3] = r;	
> +}

this looks like it duplicates put_h264_qpel_pixels_tab or similar from dsputil

[...]

> +	
> +	while (!done && numsteps < MAXSEARCHSTEPS)
> +	{
> +		int startx = curx - SEARCHWIDTH;
> +		int starty = cury - SEARCHWIDTH;
> +		int stopx = curx + SEARCHWIDTH + 1;
> +		int stopy = cury + SEARCHWIDTH + 1;
> +		int foundbetter = 0;
> +		int scanx, scany;
> +
> +		if (startx < 0)
> +			startx = 0;
> +		if (starty < 0)
> +			starty = 0;
> +		if (stopx > t->refframe_width - 16 + 1)
> +			stopx = t->refframe_width - 16 + 1;
> +		if (stopy > t->refframe_height - 16 + 1)
> +			stopy = t->refframe_height -16 + 1;
> +	
> +		for(scany = starty; scany < stopy; scany++)
> +		{
> +			for(scanx = startx; scanx < stopx; scanx++)
> +			{
> +				if (!(curx == scanx && cury == scany))
> +				{
> +					int xvec = (scanx-x)*4-pred_mvx; // it's actually this difference which will be encoded!
> +					int yvec = (scany-y)*4-pred_mvy;
> +					int bitsize;
> +					int xmod = scanx%2;
> +					int ymod = scany%2;
> +					int absnum = xmod+ymod*2;
> +					int sae = t->dspcontext.pix_abs[0][0](0,targetmb->Y[0], 
> +						refframe->reconstructed_picture.data[0]	+ scany * refframe->reconstructed_picture.linesize[0] + scanx,
> +						refframe->reconstructed_picture.linesize[0], 16);
> +					
> +					sae += t->dspcontext.pix_abs[1][absnum](0,targetmb->U[0], 
> +						refframe->reconstructed_picture.data[1]	+ (scany/2) * refframe->reconstructed_picture.linesize[1] + scanx/2,
> +						refframe->reconstructed_picture.linesize[1], 8);
> +					sae += t->dspcontext.pix_abs[1][absnum](0,targetmb->V[0], 
> +						refframe->reconstructed_picture.data[2]	+ (scany/2) * refframe->reconstructed_picture.linesize[2] + scanx/2,
> +						refframe->reconstructed_picture.linesize[2], 8);

it would be nice if the comparission function where user selectable instead
of hardcoded like in the rest of lavc too

[...]

> +static inline int ff_h264_median(int x, int y, int z)
> +{
> +	return x+y+z-FFMIN(x,FFMIN(y,z))-FFMAX(x,FFMAX(y,z));
> +}

mid_pred()

> +
> +// Adjust the values of mvx and mvy based on the prediction from the neighbouring macroblocks
> +static void ff_h264_estimate_motion_vectors(MacroBlock *destmb, int *mvpred_x, int *mvpred_y, int *mvpred_x2, int *mvpred_y2)
> +{
> +	int mvAx = 0, mvAy = 0;
> +	int mvBx = 0, mvBy = 0;
> +	int mvCx = 0, mvCy = 0;
> +	int mvDx = 0, mvDy = 0;
> +	int Aavail = 0;
> +	int Bavail = 0;
> +	int Cavail = 0;
> +	int Davail = 0;
> +
> +	if (destmb->leftblock != NULL && destmb->leftblock->available)
> +	{
> +		Aavail = 1;
> +		mvAx = destmb->leftblock->mv_x;
> +		mvAy = destmb->leftblock->mv_y;
> +	}
> +	if (destmb->topblock != NULL)
> +	{
> +		MacroBlock *topblock = destmb->topblock;
> +
> +		if (topblock->available)
> +		{
> +			Bavail = 1;
> +			mvBx = topblock->mv_x;
> +			mvBy = topblock->mv_y;
> +		}
> +		if (topblock->leftblock != NULL && topblock->leftblock->available)
> +		{
> +			Davail = 1;
> +			mvDx = topblock->leftblock->mv_x;
> +			mvDy = topblock->leftblock->mv_y;
> +		}
> +		if (topblock->rightblock != NULL && topblock->rightblock->available)
> +		{
> +			Cavail = 1;
> +			mvCx = topblock->rightblock->mv_x;
> +			mvCy = topblock->rightblock->mv_y;
> +		}
> +	}
> +
> +	if (!Cavail)
> +	{
> +		Cavail = Davail;
> +		mvCx = mvDx;
> +		mvCy = mvDy;
> +	}
> +
> +	if (!Bavail && !Cavail && Aavail)
> +	{
> +		mvBx = mvAx;
> +		mvBy = mvAy;
> +		mvCx = mvAx;
> +		mvCy = mvAy;
> +	}
> +	
> +	*mvpred_x = ff_h264_median(mvAx,mvBx,mvCx);
> +	*mvpred_y = ff_h264_median(mvAy,mvBy,mvCy);
> +
> +	if (!Aavail || !Bavail || (Aavail && mvAx == 0 && mvAy == 0) || (Bavail && mvBx == 0 && mvBy == 0))
> +	{
> +		*mvpred_x2 = 0;
> +		*mvpred_y2 = 0;
> +	}
> +	else
> +	{
> +		*mvpred_x2 = *mvpred_x;
> +		*mvpred_y2 = *mvpred_y;
> +	}
> +}

this looks like its a duplicate of h264.c pred_motion()

[...]
> +
> +#define H264_CAVLC_CLIPTABLE_SIZE        65536

i think ive seen this at least twice already ...

[...]

> +static const int sae_mapping[SAE_ENCODED_SIZE_SIZE] = [ ... one 30kb long line snipped ... ]
> +

uhm

[...]

> +
> +/**
> + * Can contain pointers to the relevant starting points in a picture
> + */
> +typedef struct _MacroBlock

stuff begining with _ is reserved in C ...

[...]

[not reviewed mmx code]

> +/**
> + * |ZD(i,j)| = (|YD(i,j)| MF(0,0) + 2 f) >> (qbits + 1)
> + *
> + */
> +void ff_h264_hadamard_quant_2x2_mmx(int16_t Y[2][2], int QP)
> +{
> +    int qbits = 15 + QP/6;
> +    int f2 = ((1 << qbits) / 3)*2;
> +    int shift = qbits+1;
> +    int32_t shift1[1];
> +    int32_t f22[2];
> +    int16_t MF = MF00[QP%6];
> +    int16_t MF2[4];
> +    MF2[0] = MF;
> +    MF2[1] = MF;
> +    MF2[2] = MF;
> +    MF2[3] = MF;

int16_t MF2[4]= {MF,MF,MF,MF}; is cleaner IMHO

[...]

also remove tabs and trailing whitespace, they arent allowed in ffmpeg svn

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is