[FFmpeg-devel] [PATCH] G.723.1 Decoder

Thu Aug 12 04:23:06 CEST 2010

Hi,

On Thu, Aug 5, 2010 at 9:28 AM, Mohamed Naufal <naufal11 at gmail.com> wrote:
> +typedef struct {
> +    int16_t ad_cb_lag;     ///< adaptive codebook lag
> +    int16_t ad_cb_gain;
> +    int16_t trans_gain;
> +    int16_t pulse_sign;
> +    int16_t grid_index;
> +    int16_t amp_index;
> +    int32_t pulse_pos;
> +} G723_1_Subframe;
[..]
> +typedef struct {
> +    int16_t index;    ///< postfilter backward/forward lag
> +    int16_t opt_gain; ///< optimal gain
> +    int16_t sc_gain;  ///< scaling gain
> +} PPFParam;

Why are these not regular ints?

Same for struct g723_1_context:

> +    int16_t random_seed;
> +    int16_t interp_index;
> +    int16_t interp_gain;
> +    int16_t sid_gain;
> +    int16_t cur_gain;
> +
> +    int16_t reflection_coef;
[..]
> +    int16_t pf_gain;                 ///< formant postfilter
> +                                     ///< gain scaling unit memory

> +static int unpack_bitstream(G723_1_Context *p, const uint8_t *buf,
> +                            int buf_size)
[..]
> +    for (i = 0; i < SUBFRAMES; i++)
> +        p->subframe[i].grid_index = get_bits(&gb, 1);
[..]
> +    } else { /* Rate5k3 */
> +        for (i = 0; i < SUBFRAMES; i++)
> +            p->subframe[i].pulse_pos  = get_bits(&gb, 12);
> +
> +        for (i = 0; i < SUBFRAMES; i++)
> +            p->subframe[i].pulse_sign = get_bits(&gb, 4);
> +    }

You should probably unroll these manually.

> +static int16_t square_root(int val)
[..]
> +    for (i = 0; i < 14; i ++) {
> +        temp = (res+exp) * (res+exp) << 1;

int res_exp = res+exp; temp = res_exp*res_exp<<1

> +        if (val >= temp)
> +            res += exp;

Also, since temp is only used once, you don't actually need it.

Unimportant, but is res|exp faster than res+exp?

> +static inline int16_t prand(int16_t *rseed)
> +{
> +    *rseed = *rseed * 521 + 259;
> +    return *rseed;
> +}

This derefs rseed twice (I think; check disassembly). Better cache it
in a local variable and return that to prevent the second deref.

> +static inline int dot_product(const int16_t *v1, const int16_t *v2, int length)

I don't mind if you move this to celp_math.[ch], and you can probably
extract this from other int-speech codecs also then.

> +    int64_t prod;
> +
> +    for (i = 0; i < length; i++) {
> +        prod = av_clipl_int32((int64_t)v1[i] * v2[i] << 1);

You should declare the int64_t with a local scope here so gcc knows it
doesn't have to remember its value...

> +static int16_t normalize_bits_int16(int16_t x)
> +{
> +    int16_t i = 0;

For both function return type and this var: why int16_t?

> +static int16_t normalize_bits_int32(int x)

You can probably make a macro since these two are so similar.

> +static int16_t scale_vector(int16_t *vector, int16_t length)

Why return type int16_t?

+    int16_t bits, scale, max = 0;

Same.

+        int16_t v = FFABS(vector[i]);

Same.

> +static void inverse_quant(int16_t *cur_lsp, int16_t *prev_lsp,
> +                          uint8_t *lsp_index, int bad_frame)
[..]
> +    if (!bad_frame) {
> +        min_dist = 0x100;
> +        pred = 12288;
> +    } else {
> +        min_dist = 0x200;
> +        pred = 23552;
> +        lsp_index[0] = lsp_index[1] = lsp_index[2] = 0;
> +    }

Vertical align here.

> +static void lsp2lpc(int16_t *lpc)

How does this compare to ff_acelp_lsp2lpc()? A comment would be nice,
sharing code better.

> +static int16_t autocorr_max(G723_1_Context *p, int offset, int *ccr_max,
> +                            int16_t pitch_lag, int length)
[..]
> +static int16_t autocorr_max_inv(G723_1_Context *p, int offset, int *ccr_max,
> +                                int16_t pitch_lag, int length)

These two are highly similar, can they be combined?

> +static void residual_interp(int16_t *buf, int16_t *out, int16_t lag,
> +                            int16_t gain, int16_t *rseed)
[..]
> +    if (lag) { /* Voiced */
> +        int16_t *vector_ptr = buf + PITCH_MAX;
> +        /* Attenuate */
> +        for (i = 0; i < lag; i++)
> +            vector_ptr[i - lag] = vector_ptr[i - lag] * 3 >> 2;
> +        for (i = 0; i < FRAME_LEN; i++)
> +            vector_ptr[i] = vector_ptr[i - lag];

av_memcpy_backptr().

> +        memcpy(out, vector_ptr, FRAME_LEN * sizeof(int16_t));

The temp buffer shouldn't be strictly necessary.

+    } else {  /* Unvoiced */
+        for (i = 0; i < FRAME_LEN; i++)
+            out[i] = gain * prand(rseed) >> 15;
+        memset(buf, 0, (FRAME_LEN + PITCH_MAX) * sizeof(int16_t));

Given that this is the only use of prand(), I wonder if you could
optimize it by integrating prand() in here, so that you don't have to
keep de-de-derefing rseed within the loop. AFAICS, this is also the
only use of the "random seed" anyway.

> +static void iir_filter(int16_t *lpc, int16_t *src,
> +                       int *dest)

celp_filters.h calls this zero_synthesis_filter, can this be merged in there?

> +static void gain_scale(G723_1_Context *p, int16_t * buf, int energy)
> +{
> +    for (i = 0; i < SUBFRAME_LEN; i++) {
> +        temp  = av_clipl_int32((int64_t)(buf[i] >> 2) * (buf[i] >> 2) << 1);
> +        denom = av_clipl_int32((int64_t)denom + temp);
> +    }

I believe Mans once wrote a macro for simplifying 64-bit
multiplications, these could also be optimized on archs supporting
32x32->64muls (such as muleax->eax/edx on x86, I believe Mans cared
about it for Arm though), called MUL64()? Also applies to a few other
places.

Very pretty all in all, I feel there's quite some duplication with
*celp*.c, but most of what I could find was float-only. Maybe it'd be
nice if you could have a brief look at our int speech decoders and see
how much can be shared between them, but I won't insist, this looks
quite good in general.

Ronald