[FFmpeg-devel] [PATCH] WMA Voice decoder
Ronald S. Bultje
rsbultje
Sat Jan 30 23:03:26 CET 2010
Hi Vitor,
On Sat, Jan 23, 2010 at 12:06 AM, Vitor Sessak <vitor1001 at gmail.com> wrote:
> Ronald S. Bultje wrote:
>> + fcb_type, ///< Fixed codebook type in frame/block:
>> + ///< - 0: hardcoded codebook, per-frame gain,
>
> fcb_type == 0 is silence + confort noise, no? Also I think those should be
> an enum...
Yes, comment changed.
>> + int frequency_domain; ///< defines which table to use during
>> APF
>> + ///< frequency domain filtering [0-7]
>> + int spectrum_corr; ///< whether to do spectrum tilt
>> correction
>> + ///< in APF
>> + int apf_project_mode; ///< defines the filter projection mode
>> in
>> + ///< APF [0-7]
>
> I think it is better to leave those for the patch adding APF.
OK, removed (for now).
>> + int cur_pitch_val; ///< pitch value of the current frame
>
> can be a local var
Done.
>> + float silence_gain; ///< set for use in blocks if acb_type
>> == 0
>
> Do not need to be on the context, but I don't know if making this local
> wouldn't make the code uglier.
I could consider adding (like you did in SIPR) a WMAVoiceFrame struct.
Left for now, I don't want to add more arguments to the function, I
don't think it's the right solution.
>> + int frame_cntr; ///< current frame index [0 - 0xFFFF]
>
> Please add to the doxy that this is only used for prng
Done.
>> +static av_cold int decode_vbmtree(WMAVoiceContext *s)
[..]
> I'd prefer if you pass a GetBitContext and s->vbm_tree as parameters. Also,
> it looks reasonable to me to make the GetBitContext of decoder_init() a
> local var, to avoid having s->gb meaning pointing to semantically different
> things (extradata on init and frame_data in decoding).
Did the first thing. I don't agree on the second to be honest, they're
semantically different but mutually exclusive, I'd consider it a
memory waster. I modified the doxy to clear this up.
>> + * - byte 19-22: flags field (annoyingly in LE; see below for known
>> + * values),
>
> Hmm, isn't the endianness of the flags a convention you can choose?
Only for single-bit fields, some entries (e.g. the APF ones) cover
multiple bits and in fact cross byte boundaries, then endianess comes
into play, and they're LE.
>> + if (decode_vbmtree(s) < 0) {
>> + av_log(ctx, AV_LOG_ERROR, "Invalid VBM tree\n");
>> + return -1;
>> + }
>
> I think a "Invalid metadata" or "Invalid VBM tree, broken metadata?" could
> be more handy for someone debugging a demuxer.
Done.
>> + s->min_pitch_val = ((int) ((ctx->sample_rate << 8) * 0.0025) + 50)
>> >> 8;
>> + s->max_pitch_val = ((int) ((ctx->sample_rate << 8) * 0.0185) + 50)
>> >> 8;
>
> I think this is better done with integer math
I could, don't really have a strong opinion here. The binary uses
bit-truncation math here which I refuse to use out of principle (i.e.
int x int -> int64, then >> 20 and << 1). Divisions are slow.
Anything else we could use here? /= 400 and whatever for .0185?
Divisions are slow?
Maybe I should look up what the binary dll on windows does. My
impression (from other places) is that it's actually using float here.
>> + if (s->history_nsamples > MAX_SIGNAL_HISTORY) {
>> + av_log(ctx, AV_LOG_ERROR, "Signal history too big: %d (max=%d),
>> probably broken file\n",
>> + s->history_nsamples, MAX_SIGNAL_HISTORY);
>
> "Unsupported sample rate: %d"?
Done.
>> +static int get_vbm_bits(GetBitContext *gb)
[..]
> I've never actually used the {init,get}_vlc() functions, so I'm not sure,
> but it looks like they could be useful here.
I think that does something else, and has a max_depth==3 (this one has
6), or did I misread that? Michael?
>> + static const uint16_t vec_sizes[3] = { 0x80, 0x40, 0x40 };
>
> I prefer decimal
Changed, same below.
>> +static void dequant_lsp16r(GetBitContext *gb,
>> + double *i_lsps, const double *old,
>> + double *a1, double *a2, int q_mode)
[..]
> Semi-duplicated code, but hard to see how to factor it...
Yes, I have the same problem, I think I factored out what I could and
the rest is duplicated...
>> + for (offset = start_offset[bits] - 11, n = 0;
>> + offset < MAX_FRAMESIZE + s->aw_pitch_range / 2 && n < 11;
>> + offset += pitch[offset >= MAX_FRAMESIZE / 2], n++)
>> + off_table[n] = offset;
>
> Please do not write all this in a single statement.
Changed this.
>> + while (n < 11) off_table[n++] = NO_OFFSET;
>
> Line break
Done.
>> + s->aw_n_pulses[0] = s->aw_n_pulses[1] = 0;
>> + s->aw_first_pulse_off[0] = s->aw_first_pulse_off[1] = NO_OFFSET;
>> + first_idx[0] = first_idx[1] = 0;
>> + for (n = 0; n < 11; n++) {
>> + if (off_table[n] >= MAX_FRAMESIZE / 2) {
>> + if (off_table[n] < MAX_FRAMESIZE) { ///< block[1]
>> + if (s->aw_n_pulses[1]++ == 0) {
>> + s->aw_first_pulse_off[1] = off_table[n] -
>> + (MAX_FRAMESIZE + s->aw_pitch_range) / 2;
>> + first_idx[1] = n;
>> + }
>> + }
>
> Does off_table[n] >= MAX_FRAMESIZE have the same behavior than off_table[n]
> == NO_OFFSET? If yes, it can be simpified by making off_table[n] >=
> MAX_FRAMESIZE impossible.
>
>> + } else if (off_table[n] >= 0) { ///< block[0]
>> + if (s->aw_n_pulses[0]++ == 0) {
>> + s->aw_first_pulse_off[0] =
>> + off_table[n] - s->aw_pitch_range / 2;
>> + first_idx[0] = n;
>> + }
>> + }
>
> Also
>
> int idx = off_table[n] >= MAX_FRAMESIZE / 2;
> if (s->aw_n_pulses[idx]++ == 0) {
> s->aw_first_pulse_off[idx] = off_table[n] -
> (MAX_FRAMESIZE + s->aw_pitch_range) / 2;
> first_idx[idx] = n;
> }
Both done.
>> + if (pulse_off != NO_OFFSET) for (n = 0; n < 11; n++) {
>
> line break
Yes.
>> + for (idx = m; idx < m + s->aw_pitch_range; idx++)
>> + if (idx >= 0 && idx < size) UNSET_BIT(idx);
>> + }
>
> Is this bit array there just to optimize a range-checking?
No, decrease stack size usage.
>> + for (n = 0, m = 0; m < 500 && n < range; pulse_start++, m++) {
>> + for (idx = pulse_start; idx < 0; idx += pitch);
>> + if (idx >= size) {
>> + for (idx = 0; idx < size; idx++)
>> + if (BIT_IS_SET(idx)) break;
>> + if (idx >= size) continue;
>> + }
>> + if (BIT_IS_SET(idx)) {
>> + arr2[n++] = idx;
>> + UNSET_BIT(idx);
>> + }
>> + }
>
> Isn't this whole loop a NOP when pulse_off == NO_OFFSET? I'd say this
> calculation need more cleanup...
I removed arr2, but no, in fact there's no exclusions (exclusion =
unset bit) if pulse_off == NO_OFFSET. I tried to simplify it, let me
know if this is better.
>> + idx = get_bits(gb, s->aw_n_pulses[0] ? 5 - 2 * block_idx : 4);
>> + v = get_bits1(gb) ? -1.0 : 1.0;
>> + for (n = arr2[idx]; n < size; n += pitch)
>> + out[n] += v;
>
> AMRFixed.n = 1;
> AMRFixed.x[0] = n;
> AMRFixed.y[0] = v;
> AMRFixed.pitch_lag = pitch;
> AMRFixed.pitch_fac = 1.0;
>
> ff_set_fixed_vector()
Changed, same below.
>> + * Generate a random number that is different for each frame/block
>> + * and can be used as an index for a table with 1000 entries, if
>> + * we want to read @block_size entries following.
>
> Doxy formatting
Changed.
>> + * @returns a unique random number for each @block_cntr/@block_num
>
> It cannot be unique. Imagine if you have more than 1000 frames.
>
> I'd say something like
>
> "Returns a random number in [0, 1000-block_size] calculated from frame_cntr
> and block_size". Or even better, you can pass max == 1000 - block_size as a
> parameter and generate a prn between [0, max].
Changed to something as suggested.
>> + int bd_idx = s->vbm_tree[get_vbm_bits(gb)],
>> + block_nsamples = MAX_FRAMESIZE / frame_descs[bd_idx].n_blocks;
>
> Is frame size ever different from MAX_FRAMESIZE?
No.
>> + memcpy(synth, s->synth_history,
>> + s->lsps * sizeof(float));
>> + memcpy(excitation, s->excitation_history,
>> + s->history_nsamples * sizeof(float));
>
> I slightly prefer (one memcpy less) instead of
>
> memcpy(synth, s->synth_hist, mem_size);
> decode(synth);
> memcpy(s->synth_hist, synth + size - mem_size, mem_size);
>
> doing
>
> decode(s->synth);
> memmove(s->synth, s->synth + size, mem_size);
Didn't really get a reaction. I left it as-is for now, I'll change if
you prefer the other one, still, or if others express similar
sentiment. Again, I don't care much either way, but this decreases mem
usage.
>> +static int parse_packet_header(WMAVoiceContext *s)
[..]
>> + if (get_bits_left(gb) < 11)
>> + return 1;
>
> Can this ever happen?
With specially crafted streams, yes.
Will attach an updated patch in a bit.
Ronald
More information about the ffmpeg-devel
mailing list