[FFmpeg-devel] [PATCH] WMA Voice decoder
Reimar Döffinger
Reimar.Doeffinger
Sun Jan 31 00:48:31 CET 2010
On Sat, Jan 30, 2010 at 05:06:45PM -0500, Ronald S. Bultje wrote:
> +static const struct frame_type_desc {
> + short n_blocks, ///< amount of blocks per frame
> + ///< (each block contains 160/#n_blocks samples)
> + log_n_blocks, ///< log2(n_blocks)
> + acb_type, ///< Adaptive codebook type in frame/block:
> + ///< - 0: no adaptive codebook (only hardcoded fixed)
> + ///< - 1: adaptive codebook with per-frame pitch,
> + ///< - 2: adaptive codebook with per-block pitch
> + fcb_type, ///< Fixed codebook type in frame/block:
> + ///< - 0: hardcoded codebook, per-frame gain,
> + ///< is used as comfort noise during silence,
> + ///< - 1: hardcoded codebook, per-block gain,
> + ///< - 2: pitch-adaptive window (AW) pulse signal,
> + ///< - 3-5: innovation (fixed) codebook pulses with
> + ///< different combinations of single/double
> + ///< pulses (see #dbl_pulses)
> + dbl_pulses, ///< how many pulse vectors have pulse pairs
> + ///< (rather than just one single pulse)
> + ///< only if #fcb_type >= 3 && <= 5
> + frame_size; ///< the amount of bits that make up the block
> + ///< data (per frame)
"short"? Also specifying the type only once is ugly.
Also if you use frame_size with a shift they all can be uint8_t
(though probably it's better to just "waste" those 34 bytes and
leave frame_size 16 bits, but the others don't need to be.
> + uint8_t sframe_cache[SFRAME_CACHE_SIZE + FF_INPUT_BUFFER_PADDING_SIZE];
> + ///< cache for superframe data split over
> + ///< multiple packets
> + PutBitContext pb; ///< points into #sframe_cache
"points" is a strange wording.
> + int cache_sframe_size; ///< set to >0 if we have data from an
I'd put that right next to sframe_cache.
Also having cache_sframe_size and SFRAME_CACHE_SIZE is a bit confusing,
the macro maybe should have a "MAX", it should be consistently either
sframe_cache or cache_sframe.
> + short aw_n_pulses[2]; ///< number of AW-pulses in each block
> + short aw_first_pulse_off[2]; ///< index of first sample to which to
> + ///< apply AW-pulses, or -0xff if unset
More shorts...
> + float gain_pred_err[6]; ///< cache for future gain prediction
> + float excitation_history[MAX_SIGNAL_HISTORY];
> + ///< cache of the signal of previous
> + ///< superframes, used as a history for
> + ///< future signal generation
"future ..." sounds a bit strange to me, I would understand that to mean
an unimplemented feature, but that's probably not meant.
> +static av_cold int decode_vbmtree(GetBitContext *gb, uint8_t *vbm_tree)
> +{
> + unsigned int cntr[8], n, res;
> +
> + memset(vbm_tree, -1, sizeof(uint8_t) * 25);
Personally I consider 0xff nicer with memset
Also sizeof(uint8_t) sure is pointless.
And maybe change
uint8_t *vbm_tree to uint8_t vbm_tree[25] to document the expected size?
> + memset(cntr, 0, sizeof(cntr));
> + for (n = 0; n < 17; n++) {
> + res = get_bits(gb, 3);
> + if (cntr[res] >= 3 + (res == 7))
> + return -1;
What is the reason this can't just be cntr[res] > 3?
(assuming performance matters, documenting can't hurt
either way).
> + vbm_tree[res * 3 + cntr[res]++] = n;
> + }
Making cntr unsigned seems like quite some overkill (I think we generally
use int unless we need to use unsigned, right?).
> + /**
> + * Extradata layout:
> + * - byte 0-18: WMAPro-in-WMAVoice extradata (see wmaprodec.c),
> + * - byte 19-22: flags field (annoyingly in LE; see below for known
> + * values),
> + * - byte 23-46: variable bitmode tree (really just 25 * 3 bits,
> + * rest is 0).
> + */
> + if (ctx->extradata_size != 0x2E) {
> + av_log(ctx, AV_LOG_ERROR,
> + "Invalid extradata size 0x%x != 0x2E\n", ctx->extradata_size);
Description in decimal + check in hex = bad
> + if (flags & 0x1000) {
A name for that wouldn't hurt.
> + for (n = 0; n < s->lsps; n++)
> + s->prev_lsps[n] = M_PI * (n + 1.0) / (s->lsps + 1.0);
Btw. if this or anything else is state that is kept between frames,
you should have a flush function that resets it to avoid artefacts
after seeking.
> + s->min_pitch_val = ((int) ((ctx->sample_rate << 8) * 0.0025) + 50) >> 8;
> + s->max_pitch_val = ((int) ((ctx->sample_rate << 8) * 0.0185) + 50) >> 8;
Those floats in there look very suspicious to me.
Wouldn't / 400 and * 37 / 2000 be more reliable (assuming the *32 can not cause an overflow)?
> + av_log(ctx, AV_LOG_ERROR, "Unsupported samplerate %d (causes history=%d, max=%d)\n",
"causes" is a strange word to use.
> + s->block_pitch_range = s->block_conv_table[2] +
> + s->block_conv_table[3] + 1 +
> + 2 * (s->block_conv_table[1] -
> + (2 * s->min_pitch_val));
Not very readable. Maybe also because of the pointless innermost ()
> +static int get_vbm_bits(GetBitContext *gb)
> +{
> + int n, res;
> +
> + for (n = 0; ; n++) {
> + res = get_bits(gb, 2);
> + if (res < 3 || n == 6 /** don't increase n to 7 */)
> + break;
> + }
So
for (n = -1; n < 7 && res < 3; n++)
or maybe better
for (n = 0; n < 8 && res < 3; n++) {
...
}
return 3 * (n - 1) + res;
or whatever
Also don't use doxy-style comments for things that doxygen can not
make any sense of.
> + * @param sizes range (well, max. value) of each quantized value in @values
well...
> +static void dequant_lsps(double *lsps, int num,
> + const uint16_t *values, const uint16_t *sizes,
> + int n_stages, const uint8_t *table,
> + const double *mul_q, const double *base_q)
> +{
> + int n, m;
> +
> + memset(lsps, 0, num * sizeof(double));
sizeof(*lsps), we should not make it needlessly hard to switch to float.
> + for (n = 0; n < n_stages; table += sizes[n++] * num)
Only the n++ belongs inside the for(), everything else IMO is too hard to read.
> + for (m = 0; m < num; m++)
> + lsps[m] += base_q[n] + mul_q[n] * table[m + values[n] * num];
I consider it better to extract constant stuff like values[n]*num outside the loop.
Particularly since due to aliasing rules the compiler might not be able to
to pull e.g. the base_q[n] out without that (hm, I admit I don't know for sure
if the const might be enough for that).
> + for (n = 0; n < 10; n++) {
> + a1[n] = ipol_tab[interpol][0][n] * (old[n] - i_lsps[n]) +
> + i_lsps[n];
> + a1[10 + n] = ipol_tab[interpol][1][n] * (old[n] - i_lsps[n]) +
> + i_lsps[n];
The subtraction is duplicated.
Also if speed matters it might make sense to check if gcc handles a flattened
ipol_tab better (i.e. ipol_tab[interpol][10 + n] instead of ipol_tab[interpol][1][n]).
> + dequant_lsps(&lsps[5], 5, &v[2], &vec_sizes[2], 2,
> + ff_wmavoice_dq_lsp16i2, &mul_lsf[2], &base_lsf[2]);
I think lsps + 5, v + 2 etc. is the more common style in FFmpeg.
> +#define NO_OFFSET -0xff
Why -255? And using negative hex numbers is really close to obfuscation IMO.
> +/**
> + * Parse the offset of the first pitch-adaptive window pulses, and
> + * the distribution of pulses between the two blocks in this frame.
> + * @param ctx WMA Voice decoding context
> + * @param gb bit I/O context
> + * @param pitch pitch for each block in this frame
> + */
> +static void aw_parse_coords(AVCodecContext *ctx, GetBitContext *gb,
> + const short *pitch)
> +{
> + static const uint8_t start_offset[94] = {
> + 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22,
> + 24, 26, 29, 28, 30, 31, 32, 33, 34, 35, 36, 37,
> + 38, 39, 40, 41, 42, 43, 44, 46, 48, 50, 52, 54,
> + 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,
> + 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102,
> + 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126,
> + 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150,
> + 152, 154, 156, 158, 160, 162, 164, 166, 168, 170
Uh, what's the magic behind this table?
And is that one non-monotonous value really correct?
> + short off_table[11], first_idx[2];
More shorts. Get rid of them all.
> + s->aw_idx_is_ext = 0;
> + if ((bits = get_bits(gb, 6)) >= 54) {
IMO never ever munge assignments and ifs, it is pointless obfuscation.
> + s->aw_pitch_range = 16 + 8 * (FFMIN(pitch[0], pitch[1]) > 32);
Both
s->aw_pitch_range = 16;
if (FFMIN(pitch[0], pitch[1]) > 32)
s->aw_pitch_range += 8;
and
s->aw_pitch_range = FFMIN(pitch[0], pitch[1]) > 32 ? 24 : 16;
are a lot more readable.
> + n = 0;
> + for (offset = start_offset[bits] - 11;
> + offset < MAX_FRAMESIZE && n < 11;
> + offset += pitch[offset >= MAX_FRAMESIZE / 2])
> + off_table[n++] = offset >= 0 ? offset : NO_OFFSET;
> + while (n < 11)
> + off_table[n++] = NO_OFFSET;
Argl *drops from chair on floor*
How about making off_table int16_t, NO_OFFSET -1
and doing
memset(off_table, 0xff, sizeof(off_table);
for (n = 0; n < 11 && offset < MAX_FRAMESIZE; n++)
{
if (offset >= 0)
off_table[n] = offset;
offset += pitch[offset >= MAX_FRAMESIZE / 2];
}
Assuming pitch can be negative, else further simplification
would of course be possible.
Well, actually even if not you could probably do some stuff,
but for a loop that's only done 11 times it's not worth it I guess.
> + if (first_idx[0] > 0)
> + while (s->aw_first_pulse_off[0] - pitch[0] + s->aw_pitch_range > 0)
> + s->aw_first_pulse_off[0] -= pitch[0];
> + if (first_idx[1] > 0)
> + while (s->aw_first_pulse_off[1] - pitch[1] + s->aw_pitch_range > 0)
> + s->aw_first_pulse_off[1] -= pitch[1];
Code duplication?
> + if (s->aw_n_pulses[0]) {
> + if (block_idx == 0) {
> + range = 32;
> + } else { ///< block_idx == 1
no pointless doxy syntax please.
> +#define BIT_IS_SET(idx) use_mask[idx >> 5] & (1 << (idx & 31))
> +#define UNSET_BIT(idx) use_mask[idx >> 5] &= ~(1 << (idx & 31))
> +
> + if (pulse_off != NO_OFFSET)
> + for (n = 0; n < 11; n++) {
> + m = pulse_off + n * fcb->pitch_lag;
> + for (idx = m; idx < m + s->aw_pitch_range; idx++)
> + if (idx >= 0 && idx < MAX_FRAMESIZE / 2) UNSET_BIT(idx);
Not sure about that.
> + for (idx = 0; idx < MAX_FRAMESIZE / 2; idx++)
> + if (BIT_IS_SET(idx)) break;
But that looks about like the most inefficient way to scan 80 bits for
a non-zero one ever conceived.
> + if (s->aw_pitch_range == 24) { ///< 3 pulses, 1:sign + 3:index each
> + } else { ///< 4 pulses, 1:sign + 2:index each
I guess you can find all of those on your own.
> + while (fcb->x[fcb->n] < 0)
> + fcb->x[fcb->n] += fcb->pitch_lag;
What are the values here? This might be a particularly slow way to calculate
if (fcb->x[fcb->n] < 0)
fcb->x[fcb->n] = (fcb->x[fcb->n] % fcb->pitch_lag) + fcb->pitch_lag;
> + } else {
> + int num2 = (val & 0x1FF) >> 1, delta, idx;
> +
> + if (num2 < 79) delta = 1;
> + else if (num2 < 156) delta = 2;
> + else if (num2 < 231) delta = 3;
> + else delta = 4;
> + idx = (delta * delta + num2) % 80;
What?
> + delta += delta - 1;
Ah. Still, a multiplication to calculate the square of
values 1 to 4 is maybe a bit overkill.
> + fcb->x[fcb->n] = idx - delta;
> + fcb->y[fcb->n++] = v;
> + fcb->x[fcb->n] = idx;
> + fcb->y[fcb->n++] = (val & 1) ? -v : v;
The increments on a separate line IMHO would be more readable.
> + for (n = 0; n < size; n++)
> + excitation[n] = ff_wmavoice_std_codebook[r_idx + n] * gain;
No dsputil function for that I guess?
> + memset(pulses, 0, sizeof(float) * size);
sizeof(*pulses)
> + float pulse = get_bits1(gb) ? 1.0 : -1.0;
I've seen that a lot. Maybe worth optimizing?
> + memmove(&s->gain_pred_err[8 >> frame_desc->log_n_blocks],
> + s->gain_pred_err,
> + sizeof(float) * (6 - (8 >> frame_desc->log_n_blocks)));
sizeof(type) is rarely useful, maybe check them all.
> + for (n = 0; n < 8 >> frame_desc->log_n_blocks; n++)
> + s->gain_pred_err[n] = pred_err;
That 8 >> ... is used a lot, give it a variable with a proper name.
> + int pitch_sh8 = (s->last_pitch_val << 8) +
> + ((s->pitch_diff_sh16 * (block_idx * size + n)) >> 8);
Maybe split that in a few lines.
> + ff_acelp_interpolatef(&excitation[n], &excitation[n - pitch],
> + ff_wmavoice_ipol1_coeffs, 17,
> + (idx >= 0) | (abs(idx) << 1), 9, 1);
Maybe FFABS? Also that kind of thing is quite common, there may be a function for it.
> + for (n = 0; n < size; n++)
> + excitation[n] = excitation[n - block_pitch];
Note: I think I have seen this a few times already, this applies to all cases.
This is a quite inefficient way to do it, have a look at
memcpy_backptr (and possibly optimize that, too - I think we have a few functions
available now that help like AV_COPY).
Remaining part not reviewed, I'm too tired now.
More information about the ffmpeg-devel
mailing list