[FFmpeg-devel] [PATCH] WMA Voice decoder

Wed Jan 20 22:52:23 CET 2010

Hi,

2010/1/20 M?ns Rullg?rd <mans at mansr.com>:
> "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>> +static const struct frame_type_desc {
>> + ? ?short ? n_blocks, ? ? ///< amount of blocks per frame
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< (each block contains 160/#n_blocks samples)
>> + ? ? ? ? ? ?acb_type, ? ? ///< Adaptive codebook type in frame/block:
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - 0: fixed codebook with per-block/frame gain,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - 1: adaptive codebook with per-frame pitch,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - 2: adaptive codebook with per-block pitch
>> + ? ? ? ? ? ?fcb_type, ? ? ///< Fixed codebook type in frame/block:
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - ? 0: hardcoded codebook, per-frame gain,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - ? 1: hardcoded codebook, per-block gain,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - ? 2: pitch-adaptive window (AW) pulse signal,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - 4-6: innovation (fixed) codebook pulses
>> + ? ? ? ? ? ?dbl_pulses, ? ///< how many pulse vectors have pulse pairs
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< (rather than just one single pulse)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< only if #fcb_type >= 4 && <= 6
>> + ? ? ? ? ? ?frame_size; ? ///< the amount of bits that make up the block
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< data (per frame)
>> +} frame_descs[17] = {
>> + ? ?{ 1, 0, 0, 0, ? 0 },
>> + ? ?{ 2, 0, 1, 0, ?28 },
>> + ? ?{ 2, 1, 2, 0, ?46 },
>> + ? ?{ 2, 1, 4, 2, ?80 },
>> + ? ?{ 2, 1, 4, 5, 104 },
>> + ? ?{ 4, 1, 5, 0, 108 },
>> + ? ?{ 4, 1, 5, 2, 132 },
>> + ? ?{ 4, 1, 5, 5, 168 },
>> + ? ?{ 2, 2, 4, 0, ?64 },
>> + ? ?{ 2, 2, 4, 2, ?80 },
>> + ? ?{ 2, 2, 4, 5, 104 },
>> + ? ?{ 4, 2, 5, 0, 108 },
>> + ? ?{ 4, 2, 5, 2, 132 },
>> + ? ?{ 4, 2, 5, 5, 168 },
>> + ? ?{ 8, 2, 6, 0, 176 },
>> + ? ?{ 8, 2, 6, 2, 208 },
>> + ? ?{ 8, 2, 6, 5, 256 }
>> +};
>
> I suggest splitting the struct declaration from the frame_descs[]
> definition. ?What you have there looks a little odd to me, though it
> will of course work correctly.

So the reason I did it this way is because other than this one use,
the structure isn't really very meaningful, and this way I can
doxument both struct and variable at the same time.

>> +static int
>> +get_vbm_bits(GetBitContext *gb)
>> +{
>> + ? ?int n, res;
>> +
>> + ? ?for (n = 0; ; n++) {
>> + ? ? ? ?res = get_bits(gb, 2);
>> + ? ? ? ?if (res < 3 || n == 6 /** don't increase n to 7 */)
>> + ? ? ? ? ? ?break;
>> + ? ?}
>> +
>> + ? ?return 3 * n + res;
>> +}
>
> Is this called a lot? ?If yes, it can be optimised.

Once per frame (160 samples), so not really.

>> +static void
>> +dequant_lsps(double *lsps, int num,
>> + ? ? ? ? ? ? const uint16_t *values, const uint16_t *sizes, int n_stages,
>> + ? ? ? ? ? ? const uint8_t *table, const double *mul_q, const double *base_q)
>> +{
>> + ? ?int n, m;
>> +
>> + ? ?for (n = 0; n < num; n++) lsps[n] = 0.0;
>> + ? ?for (n = 0; n < n_stages; table += sizes[n++] * num) {
>> + ? ? ? ?for (m = 0; m < num; m++)
>> + ? ? ? ? ? ?lsps[m] += base_q[n] + mul_q[n] * table[m + values[n] * num];
>> + ? ?}
>> +}
>
> Does this really need double precision? ?Certainly the inputs could be
> single-precision even if the output needs more. ?The input values it
> is called with look like single should be enough.

This is the "some functions take double" thing. I haven't tested these
small arrays with double vs. float because they are small. My
hypothesis would be that it makes no difference, and I in fact think
we should convert all these functions (including lsp2lpc) to float.

> Another thing worth trying is storing the tables as the top 16 bits of
> floats directly (the low 16 bits are all zeros for values up to 256).
> While this doubles the table size, it avoids the int to float
> conversion step.

You mean "table" as variable right? It's HUGE, I mean HUGE, so I
prefer to keep as uint8_t. It probably takes about 90% of
wmavoice_data.h. I agre this is suboptimal, but don't really think
there's an easy way around.

> This might use a bit of simd too.
>
> Is there any pattern in those tables that could be exploited?

I've gotten rid of about as much as I could see already (the data
tables in the binary are quite a bit bigger), but if anyone else sees
more that I can get rid of, I'd love to hear it.

>> + ? ? ? ? ? ? ? ?for (n = 0; n < size; n++)
>> + ? ? ? ? ? ? ? ? ? ?excitation[n] = excitation[n - block_pitch];
>
> What are possible values for block_pitch?

A typical value would be 40. I think you're trying to see if I could
use memmove(), and no I tried but that fails because of the
"non-destructiveness" of it (the destructiveness, or rather
continuation, of the signal is intentional). As you'd expect, memcpy()
has the same issue, which is why I kept the manual loop in place.

>> + ? ?/** convert interpolated LSPs to LPCs */
>> + ? ?fac = (block_idx + 0.5) / frame_desc->n_blocks;
>> + ? ?for (n = 0; n < s->lsps; n++) ///< LSF -> LSP
>> + ? ? ? ?i_lsps[n] = cos(prev_lsps[n] + fac * (lsps[n] - prev_lsps[n]));
>
> Calling trig functions is generally bad. ?Perhaps it can't be avoided
> here though.

You can't really prevent that, I'm affraid, or else things like
ensuring min-distance wouldn't work as intended.

>> + ? ? ? ? ? ? ? ?pitch[n] = (fac ? ? ? ? ? ? ? ? * s->cur_pitch_val +
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?(n_blocks_x2 - fac) * s->last_pitch_val +
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?frame_descs[bd_idx].n_blocks) / n_blocks_x2;
>
> These could use MUL16/MAC16.

Stupid question: why? (I seem to recall Michael claiming at some point
that he didn't want functions / macros that would basically do
something like this, I think he literally gave multiplication as an
example, and lo-and-behold, there's MUL16! - so why is this
preferrably over a generic multiplication?)

[..]
>> + ? ? ? ?synth_block(ctx, gb, n, block_nsamples, bl_pitch_sh2,
>> + ? ? ? ? ? ? ? ? ? ?lsps, prev_lsps, &frame_descs[bd_idx],
>> + ? ? ? ? ? ? ? ? ? ?&excitation[n * block_nsamples],
>> + ? ? ? ? ? ? ? ? ? ?&synth[n * block_nsamples]);
>> + ? ?}
>
> Does gcc split this loop in two, one for acb_type==2 and one for !=2?
> If not, consider doing it manually. ?That if() block is big that it
> probably makes a difference.

gcc inlines all of these into a massive synth_superframe(). Since
unlooping it would essentially result in duplicating the contents of
synth_block into this massive inlined func (which doesn't occur), I
think I'll conclude that it doesn't.

But if it really inlines it, like ALL of it, is duplicating it such a good idea?

>> +static void copy_bits(PutBitContext *pb,
>> + ? ? ? ? ? ? ? ? ? ? ?const uint8_t *data, int size,
>> + ? ? ? ? ? ? ? ? ? ? ?GetBitContext *gb, int nbits)
>> +{
>> + ? ?int rmn_bytes, rmn_bits;
>> +
>> + ? ?rmn_bits = rmn_bytes = get_bits_left(gb);
>> + ? ?if (rmn_bits < nbits)
>> + ? ? ? ?return;
>> + ? ?rmn_bits &= 7; rmn_bytes >>= 3;
>> + ? ?if ((rmn_bits = FFMIN(rmn_bits, nbits)) > 0)
>> + ? ? ? ?put_bits(pb, rmn_bits, get_bits(gb, rmn_bits));
>> + ? ?ff_copy_bits(pb, data + size - rmn_bytes,
>> + ? ? ? ? ? ? ? ? FFMIN(nbits - rmn_bits, rmn_bytes << 3));
>> +}
>
> This should probably go somewhere more generic.

Yes, in wma.c, because wmapro & wma1/2 also use it. I was hoping
nobody would notice and I could do that after the initial commit, but
I guess I'm doomed. Damn it. Pretty please?

>> +/**
>> + * @file libavcodec/wmavoice_data.h
>> + * @brief Windows Media Voice (WMAVoice) tables
>> + * @author Ronald S. Bultje <rsbultje at gmail.com>
>> + */
>
> These tables are huge. ?Is there any structure in them that could be
> used to reduce the size?

Vitor and I are seeing if we can use generic sinc window tables for
the ipol tables at the end, but other than that, 90% of this is dq
tables and I'm affraid they're about as small as possible already...
:-(.

Ronald