[FFmpeg-devel] [PATCH] WMA Voice decoder

Sat Jan 30 23:03:26 CET 2010

Hi Vitor,

On Sat, Jan 23, 2010 at 12:06 AM, Vitor Sessak <vitor1001 at gmail.com> wrote:
> Ronald S. Bultje wrote:
>> +            fcb_type,     ///< Fixed codebook type in frame/block:
>> +                          ///< -   0: hardcoded codebook, per-frame gain,
>
> fcb_type == 0 is silence + confort noise, no? Also I think those should be
> an enum...

Yes, comment changed.

>> +    int frequency_domain;         ///< defines which table to use during
>> APF
>> +                                  ///< frequency domain filtering [0-7]
>> +    int spectrum_corr;            ///< whether to do spectrum tilt
>> correction
>> +                                  ///< in APF
>> +    int apf_project_mode;         ///< defines the filter projection mode
>> in
>> +                                  ///< APF [0-7]
>
> I think it is better to leave those for the patch adding APF.

OK, removed (for now).

>> +    int cur_pitch_val;            ///< pitch value of the current frame
>
> can be a local var

Done.

>> +    float silence_gain;           ///< set for use in blocks if acb_type
>> == 0
>
> Do not need to be on the context, but I don't know if making this local
> wouldn't make the code uglier.

I could consider adding (like you did in SIPR) a WMAVoiceFrame struct.
Left for now, I don't want to add more arguments to the function, I
don't think it's the right solution.

>> +    int frame_cntr;               ///< current frame index [0 - 0xFFFF]
>
> Please add to the doxy that this is only used for prng

Done.

>> +static av_cold int decode_vbmtree(WMAVoiceContext *s)
[..]
> I'd prefer if you pass a GetBitContext and s->vbm_tree as parameters. Also,
> it looks reasonable to me to make the GetBitContext of decoder_init() a
> local var, to avoid having s->gb meaning pointing to semantically different
> things (extradata on init and frame_data in decoding).

Did the first thing. I don't agree on the second to be honest, they're
semantically different but mutually exclusive, I'd consider it a
memory waster. I modified the doxy to clear this up.

>> +     * - byte 19-22: flags field (annoyingly in LE; see below for known
>> +     *               values),
>
> Hmm, isn't the endianness of the flags a convention you can choose?

Only for single-bit fields, some entries (e.g. the APF ones) cover
multiple bits and in fact cross byte boundaries, then endianess comes
into play, and they're LE.

>> +    if (decode_vbmtree(s) < 0) {
>> +        av_log(ctx, AV_LOG_ERROR, "Invalid VBM tree\n");
>> +        return -1;
>> +    }
>
> I think a "Invalid metadata" or "Invalid VBM tree, broken metadata?" could
> be more handy for someone debugging a demuxer.

Done.

>> +    s->min_pitch_val    = ((int) ((ctx->sample_rate << 8) * 0.0025) + 50)
>> >> 8;
>> +    s->max_pitch_val    = ((int) ((ctx->sample_rate << 8) * 0.0185) + 50)
>> >> 8;
>
> I think this is better done with integer math

I could, don't really have a strong opinion here. The binary uses
bit-truncation math here which I refuse to use out of principle (i.e.
int x int -> int64, then >> 20  and << 1). Divisions are slow.
Anything else we could use here? /= 400 and whatever for .0185?
Divisions are slow?

Maybe I should look up what the binary dll on windows does. My
impression (from other places) is that it's actually using float here.

>> +    if (s->history_nsamples > MAX_SIGNAL_HISTORY) {
>> +        av_log(ctx, AV_LOG_ERROR, "Signal history too big: %d (max=%d),
>> probably broken file\n",
>> +               s->history_nsamples, MAX_SIGNAL_HISTORY);
>
> "Unsupported sample rate: %d"?

Done.

>> +static int get_vbm_bits(GetBitContext *gb)
[..]
> I've never actually used the {init,get}_vlc() functions, so I'm not sure,
> but it looks like they could be useful here.

I think that does something else, and has a max_depth==3 (this one has
6), or did I misread that? Michael?

>> +    static const uint16_t vec_sizes[3] = { 0x80, 0x40, 0x40 };
>
> I prefer decimal

Changed, same below.

>> +static void dequant_lsp16r(GetBitContext *gb,
>> +                           double *i_lsps, const double *old,
>> +                           double *a1, double *a2, int q_mode)
[..]
> Semi-duplicated code, but hard to see how to factor it...

Yes, I have the same problem, I think I factored out what I could and
the rest is duplicated...

>> +    for (offset = start_offset[bits] - 11,                 n = 0;
>> +         offset < MAX_FRAMESIZE + s->aw_pitch_range / 2 && n < 11;
>> +         offset += pitch[offset >= MAX_FRAMESIZE / 2],     n++)
>> +        off_table[n] = offset;
>
> Please do not write all this in a single statement.

Changed this.

>> +    while (n < 11) off_table[n++] = NO_OFFSET;
>
> Line break

Done.

>> +    s->aw_n_pulses[0]        = s->aw_n_pulses[1]        = 0;
>> +    s->aw_first_pulse_off[0] = s->aw_first_pulse_off[1] = NO_OFFSET;
>> +    first_idx[0]             = first_idx[1]             = 0;
>> +    for (n = 0; n < 11; n++) {
>> +        if (off_table[n] >= MAX_FRAMESIZE / 2) {
>> +            if (off_table[n] < MAX_FRAMESIZE) { ///< block[1]
>> +                if (s->aw_n_pulses[1]++ == 0) {
>> +                    s->aw_first_pulse_off[1] = off_table[n] -
>> +                        (MAX_FRAMESIZE + s->aw_pitch_range) / 2;
>> +                    first_idx[1]             = n;
>> +                }
>> +            }
>
> Does off_table[n] >= MAX_FRAMESIZE have the same behavior than off_table[n]
> == NO_OFFSET? If yes, it can be simpified by making off_table[n] >=
> MAX_FRAMESIZE impossible.
>
>> +        } else if (off_table[n] >= 0) { ///< block[0]
>> +            if (s->aw_n_pulses[0]++ == 0) {
>> +                s->aw_first_pulse_off[0] =
>> +                    off_table[n] - s->aw_pitch_range / 2;
>> +                first_idx[0]             = n;
>> +            }
>> +        }
>
> Also
>
> int idx = off_table[n] >= MAX_FRAMESIZE / 2;
> if (s->aw_n_pulses[idx]++ == 0) {
>    s->aw_first_pulse_off[idx] = off_table[n] -
>        (MAX_FRAMESIZE + s->aw_pitch_range) / 2;
>    first_idx[idx]             = n;
> }

Both done.

>> +    if (pulse_off != NO_OFFSET) for (n = 0; n < 11; n++) {
>
> line break

Yes.

>> +        for (idx = m; idx < m + s->aw_pitch_range; idx++)
>> +            if (idx >= 0 && idx < size) UNSET_BIT(idx);
>> +    }
>
> Is this bit array there just to optimize a range-checking?

No, decrease stack size usage.

>> +    for (n = 0, m = 0; m < 500 && n < range; pulse_start++, m++) {
>> +        for (idx = pulse_start; idx < 0; idx += pitch);
>> +        if (idx >= size) {
>> +            for (idx = 0; idx < size; idx++)
>> +                if (BIT_IS_SET(idx)) break;
>> +            if (idx >= size) continue;
>> +        }
>> +        if (BIT_IS_SET(idx)) {
>> +            arr2[n++] = idx;
>> +            UNSET_BIT(idx);
>> +        }
>> +    }
>
> Isn't this whole loop a NOP when pulse_off == NO_OFFSET? I'd say this
> calculation need more cleanup...

I removed arr2, but no, in fact there's no exclusions (exclusion =
unset bit) if pulse_off == NO_OFFSET. I tried to simplify it, let me
know if this is better.

>> +    idx = get_bits(gb, s->aw_n_pulses[0] ? 5 - 2 * block_idx : 4);
>> +    v   = get_bits1(gb) ? -1.0 : 1.0;
>> +    for (n = arr2[idx]; n < size; n += pitch)
>> +        out[n] += v;
>
> AMRFixed.n = 1;
> AMRFixed.x[0] = n;
> AMRFixed.y[0] = v;
> AMRFixed.pitch_lag = pitch;
> AMRFixed.pitch_fac = 1.0;
>
> ff_set_fixed_vector()

Changed, same below.

>> + * Generate a random number that is different for each frame/block
>> + * and can be used as an index for a table with 1000 entries, if
>> + * we want to read @block_size entries following.
>
> Doxy formatting

Changed.

>> + * @returns a unique random number for each @block_cntr/@block_num
>
> It cannot be unique. Imagine if you have more than 1000 frames.
>
> I'd say something like
>
> "Returns a random number in [0, 1000-block_size] calculated from frame_cntr
> and block_size". Or even better, you can pass max == 1000 - block_size as a
> parameter and generate a prn between [0, max].

Changed to something as suggested.

>> +    int bd_idx = s->vbm_tree[get_vbm_bits(gb)],
>> +        block_nsamples = MAX_FRAMESIZE / frame_descs[bd_idx].n_blocks;
>
> Is frame size ever different from MAX_FRAMESIZE?

No.

>> +    memcpy(synth,      s->synth_history,
>> +           s->lsps             * sizeof(float));
>> +    memcpy(excitation, s->excitation_history,
>> +           s->history_nsamples * sizeof(float));
>
> I slightly prefer (one memcpy less) instead of
>
> memcpy(synth, s->synth_hist, mem_size);
> decode(synth);
> memcpy(s->synth_hist, synth + size - mem_size, mem_size);
>
> doing
>
> decode(s->synth);
> memmove(s->synth, s->synth + size, mem_size);

Didn't really get a reaction. I left it as-is for now, I'll change if
you prefer the other one, still, or if others express similar
sentiment. Again, I don't care much either way, but this decreases mem
usage.

>> +static int parse_packet_header(WMAVoiceContext *s)
[..]
>> +    if (get_bits_left(gb) < 11)
>> +        return 1;
>
> Can this ever happen?

With specially crafted streams, yes.

Will attach an updated patch in a bit.

Ronald