[FFmpeg-devel] [PATCH] G.723.1 Decoder
Ronald S. Bultje
rsbultje
Thu Aug 12 04:23:06 CEST 2010
Hi,
On Thu, Aug 5, 2010 at 9:28 AM, Mohamed Naufal <naufal11 at gmail.com> wrote:
> +typedef struct {
> + int16_t ad_cb_lag; ///< adaptive codebook lag
> + int16_t ad_cb_gain;
> + int16_t trans_gain;
> + int16_t pulse_sign;
> + int16_t grid_index;
> + int16_t amp_index;
> + int32_t pulse_pos;
> +} G723_1_Subframe;
[..]
> +typedef struct {
> + int16_t index; ///< postfilter backward/forward lag
> + int16_t opt_gain; ///< optimal gain
> + int16_t sc_gain; ///< scaling gain
> +} PPFParam;
Why are these not regular ints?
Same for struct g723_1_context:
> + int16_t random_seed;
> + int16_t interp_index;
> + int16_t interp_gain;
> + int16_t sid_gain;
> + int16_t cur_gain;
> +
> + int16_t reflection_coef;
[..]
> + int16_t pf_gain; ///< formant postfilter
> + ///< gain scaling unit memory
> +static int unpack_bitstream(G723_1_Context *p, const uint8_t *buf,
> + int buf_size)
[..]
> + for (i = 0; i < SUBFRAMES; i++)
> + p->subframe[i].grid_index = get_bits(&gb, 1);
[..]
> + } else { /* Rate5k3 */
> + for (i = 0; i < SUBFRAMES; i++)
> + p->subframe[i].pulse_pos = get_bits(&gb, 12);
> +
> + for (i = 0; i < SUBFRAMES; i++)
> + p->subframe[i].pulse_sign = get_bits(&gb, 4);
> + }
You should probably unroll these manually.
> +static int16_t square_root(int val)
[..]
> + for (i = 0; i < 14; i ++) {
> + temp = (res+exp) * (res+exp) << 1;
int res_exp = res+exp; temp = res_exp*res_exp<<1
> + if (val >= temp)
> + res += exp;
Also, since temp is only used once, you don't actually need it.
Unimportant, but is res|exp faster than res+exp?
> +static inline int16_t prand(int16_t *rseed)
> +{
> + *rseed = *rseed * 521 + 259;
> + return *rseed;
> +}
This derefs rseed twice (I think; check disassembly). Better cache it
in a local variable and return that to prevent the second deref.
> +static inline int dot_product(const int16_t *v1, const int16_t *v2, int length)
I don't mind if you move this to celp_math.[ch], and you can probably
extract this from other int-speech codecs also then.
> + int64_t prod;
> +
> + for (i = 0; i < length; i++) {
> + prod = av_clipl_int32((int64_t)v1[i] * v2[i] << 1);
You should declare the int64_t with a local scope here so gcc knows it
doesn't have to remember its value...
> +static int16_t normalize_bits_int16(int16_t x)
> +{
> + int16_t i = 0;
For both function return type and this var: why int16_t?
> +static int16_t normalize_bits_int32(int x)
You can probably make a macro since these two are so similar.
> +static int16_t scale_vector(int16_t *vector, int16_t length)
Why return type int16_t?
+ int16_t bits, scale, max = 0;
Same.
+ int16_t v = FFABS(vector[i]);
Same.
> +static void inverse_quant(int16_t *cur_lsp, int16_t *prev_lsp,
> + uint8_t *lsp_index, int bad_frame)
[..]
> + if (!bad_frame) {
> + min_dist = 0x100;
> + pred = 12288;
> + } else {
> + min_dist = 0x200;
> + pred = 23552;
> + lsp_index[0] = lsp_index[1] = lsp_index[2] = 0;
> + }
Vertical align here.
> +static void lsp2lpc(int16_t *lpc)
How does this compare to ff_acelp_lsp2lpc()? A comment would be nice,
sharing code better.
> +static int16_t autocorr_max(G723_1_Context *p, int offset, int *ccr_max,
> + int16_t pitch_lag, int length)
[..]
> +static int16_t autocorr_max_inv(G723_1_Context *p, int offset, int *ccr_max,
> + int16_t pitch_lag, int length)
These two are highly similar, can they be combined?
> +static void residual_interp(int16_t *buf, int16_t *out, int16_t lag,
> + int16_t gain, int16_t *rseed)
[..]
> + if (lag) { /* Voiced */
> + int16_t *vector_ptr = buf + PITCH_MAX;
> + /* Attenuate */
> + for (i = 0; i < lag; i++)
> + vector_ptr[i - lag] = vector_ptr[i - lag] * 3 >> 2;
> + for (i = 0; i < FRAME_LEN; i++)
> + vector_ptr[i] = vector_ptr[i - lag];
av_memcpy_backptr().
> + memcpy(out, vector_ptr, FRAME_LEN * sizeof(int16_t));
The temp buffer shouldn't be strictly necessary.
+ } else { /* Unvoiced */
+ for (i = 0; i < FRAME_LEN; i++)
+ out[i] = gain * prand(rseed) >> 15;
+ memset(buf, 0, (FRAME_LEN + PITCH_MAX) * sizeof(int16_t));
Given that this is the only use of prand(), I wonder if you could
optimize it by integrating prand() in here, so that you don't have to
keep de-de-derefing rseed within the loop. AFAICS, this is also the
only use of the "random seed" anyway.
> +static void iir_filter(int16_t *lpc, int16_t *src,
> + int *dest)
celp_filters.h calls this zero_synthesis_filter, can this be merged in there?
> +static void gain_scale(G723_1_Context *p, int16_t * buf, int energy)
> +{
> + for (i = 0; i < SUBFRAME_LEN; i++) {
> + temp = av_clipl_int32((int64_t)(buf[i] >> 2) * (buf[i] >> 2) << 1);
> + denom = av_clipl_int32((int64_t)denom + temp);
> + }
I believe Mans once wrote a macro for simplifying 64-bit
multiplications, these could also be optimized on archs supporting
32x32->64muls (such as muleax->eax/edx on x86, I believe Mans cared
about it for Arm though), called MUL64()? Also applies to a few other
places.
Very pretty all in all, I feel there's quite some duplication with
*celp*.c, but most of what I could find was float-only. Maybe it'd be
nice if you could have a brief look at our int speech decoders and see
how much can be shared between them, but I won't insist, this looks
quite good in general.
Ronald
More information about the ffmpeg-devel
mailing list