[FFmpeg-devel] [PATCH] WMA Voice decoder

Thu Feb 11 04:25:58 CET 2010

Hi,

On Mon, Feb 1, 2010 at 2:35 PM, Reimar D?ffinger
<Reimar.Doeffinger at gmx.de> wrote:
> On Sun, Jan 31, 2010 at 09:46:43PM -0500, Ronald S. Bultje wrote:
>> On Sat, Jan 30, 2010 at 6:48 PM, Reimar D?ffinger
>> <Reimar.Doeffinger at gmx.de> wrote:
>> > "short"? Also specifying the type only once is ugly.
>> > Also if you use frame_size with a shift they all can be uint8_t
>> > (though probably it's better to just "waste" those 34 bytes and
>> > leave frame_size 16 bits, but the others don't need to be.
>>
>> OK, so I changed this to be holdable in 32bits per frame type, maybe
>> that's a bit too much just let me know how far I should go.
>
> I'm generally in favour of the most readable, at least as long as we
> are talking about saving a few _bytes_, if it was a kB that'd be different.

As we discussed, this is nearing bikeshed-status. It's not
performance-relevant and small codec binary size would therefore be
worthy, so I'll leave it as is.

>> >> + ? ?memset(cntr, ? ? ?0, sizeof(cntr));
>> >> + ? ?for (n = 0; n < 17; n++) {
>> >> + ? ? ? ?res = get_bits(gb, 3);
>> >> + ? ? ? ?if (cntr[res] >= 3 + (res == 7))
>> >> + ? ? ? ? ? ?return -1;
>> >
>> > What is the reason this can't just be cntr[res] > 3?
>> > (assuming performance matters, documenting can't hurt
>> > either way).
>>
>> It might invalidly mark invalid bitstreams as valid. I guess we don't
>> care, so changed.
>
> Unless you intend to do something useful with that information there's
> no point.
> If you do intend, I'd say it depends on the speed difference it makes.

OK, so removed (i.e. it's now > 3).

>> >> + ? ?if (flags & 0x1000) {
>> >
>> > A name for that wouldn't hurt.
>>
>> I wasn't sure what you meant here. I added a comment indicating what
>> the flag means, but I might have misunderstood you.
>
> Well, I usually consider an appropriately named variable name better
> than a comment, i.e.
> stupid_flag = flags & 0x1000;
> or
> flags & STUPID_FLAG;
> instead of
> flags & 0x1000 // stupid flag

I went for the first solution.

>> >> + ? ? ? ?for (m = 0; m < num; m++)
>> >> + ? ? ? ? ? ?lsps[m] += base_q[n] + mul_q[n] * table[m + values[n] * num];
>> >
>> > I consider it better to extract constant stuff like values[n]*num outside the loop.
>> > Particularly since due to aliasing rules the compiler might not be able to
>> > to pull e.g. the base_q[n] out without that (hm, I admit I don't know for sure
>> > if the const might be enough for that).
>>
>> I did this for the values[n] * num. For the base_q[n], it seems
>> overkill, I mean, it's marked const so the compiler should know
>> better. I think expanding base_q[n] outside the loop is too little
>> honour for gcc (maybe deserved, but still).
>
> Um, the thing is that lsps and base_q is the same type, if you don't
> add a restrict keyword the compiler usually simply _can't_ know that
> lsps[m] is not the same as base_q[n] and thus _has_ to reload
> base_q[n] each time.
> I agree though that possibly the better solution is to add restrict (even
> though I consider gcc stupid enough that I expect at least some version
> to still get it wrong).

Seems gcc didn't use restrict, so I expanded mul_q[n] and base_q[n]
before the inner loop.

>> >> + ? ?dequant_lsps(&lsps[5], ?5, &v[2], &vec_sizes[2], 2,
>> >> + ? ? ? ? ? ? ? ? ff_wmavoice_dq_lsp16i2, &mul_lsf[2], &base_lsf[2]);
>> >
>> > I think lsps + 5, v + 2 etc. is the more common style in FFmpeg.
>>
>> Uhm, so that will be a huge change, I use &..[off] inistead of ..+off
>> everywhere in my code, not just here but also in rdt.c etc. Is that a
>> big deal?
>
> Not for me, just saying that it doesn't fit so nicely in overall FFmpeg style
> I think.

Left as-is for now. If others complain also, I'll change. :-).

>> >> + ? ? ? ? ? ?for (idx = 0; idx < MAX_FRAMESIZE / 2; idx++)
>> >> + ? ? ? ? ? ? ? ?if (BIT_IS_SET(idx)) break;
>> >
>> > But that looks about like the most inefficient way to scan 80 bits for
>> > a non-zero one ever conceived.
>>
>> OK, so what do I use then? (I really don't know. :-).)
>
> What exactly is best I can't say, and that sample code probably has quite a few bugs,
> but something along the lines of
> uint64_t tmp = AV_RN64(buffer);
> if (tmp) {
> #ifndef HAVE_BIGENDIAN
> bswap_64(tmp);
> #endif
> if (tmp >> 32)
> return 32 - av_log2(tmp >> 32);
> else
> return 64 - av_log2(tmp);
> }
> return 80 - av_log2_16bit(AV_RB16(buffer + 8));
> (take care to handle the case where the return values is 80, i.e. everything is 0)

I modified the code to use av_log2_16bit(). I used 16bit instead of
regular av_log because it has the nice feature that each set1() pulse
with, excluded in set2(), is 16 or 24bits, and thus I only need a
single shift in the bit exclusion code. The 32bit version looked much
more complicated.

>> >> + ? ? ? ?if (s->aw_pitch_range == 24) { ///< 3 pulses, 1:sign + 3:index each
>> >> + ? ? ? ?} else { ///< 4 pulses, 1:sign + 2:index each
>> >
>> > I guess you can find all of those on your own.
>>
>> ???
>
> Doxy comments that make no sense because they will only confuse doxygen

All in-code doxy pollution removed.

>> >> + ? ? ? ? ? ?while (fcb->x[fcb->n] < 0)
>> >> + ? ? ? ? ? ? ? ?fcb->x[fcb->n] += fcb->pitch_lag;
>> >
>> > What are the values here? This might be a particularly slow way to calculate
>> > if (fcb->x[fcb->n] < 0)
>> > ? ?fcb->x[fcb->n] = (fcb->x[fcb->n] % fcb->pitch_lag) + fcb->pitch_lag;
>>
>> They could be 1-2 pitch values below zero, so -10, -20 or so.
>
> Then it might make sense (speed wise, it might be too ugly to do it if the speed
> does not make a difference) to do
> if (fcb->x[fcb->n] < 0) {
> ?? ?fcb->x[fcb->n] += fcb->pitch_lag;
> ? ?if (fcb->x[fcb->n] < 0) {
> ?? ? ? ?fcb->x[fcb->n] += fcb->pitch_lag;
> }

As mentioned in my other email, I can make it s/while/if/ but then
min_samplerate would be bumped from 322 to 3122. For every
decrease-by-two of samplerate (e.g. to +/-1800), I'd need one
additional if. Since generally this code doesn't trigger at all, it
might in fact be worth to just leave it as is, although I'm not sure
how useful 3000Hz voice data is...

>> >> + ? ? ? ? ? ?float pulse = get_bits1(gb) ? 1.0 : -1.0;
>> >
>> > I've seen that a lot. Maybe worth optimizing?
>>
>> Same as above (also goes for AMR, SIPR).
>
> Well, adding a function for it instead of duplicating code might still make sense
> already now, even if you optimize only later. Not important though.

Left for now, but yes we could optimize this later.

>> >> + ? ? ? ? ? ?ff_acelp_interpolatef(&excitation[n], &excitation[n - pitch],
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ff_wmavoice_ipol1_coeffs, 17,
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(idx >= 0) | (abs(idx) << 1), 9, 1);
>> >
>> > Maybe FFABS? Also that kind of thing is quite common, there may be a function for it.
>>
>> it's hard to test the speed implifcation of that because it's so
>> insignificant. I think at some point I said we should test abs() vs.
>> FFABS(), there's no clear guideline for that right now.
>>
>> I couldn't find a function, but I may just be inexperienced in finding
>> them. :-).
>
> There might be only to decode a value encoded that way.

As mentioned in another email, I changed the table so now it reads "5
+ idx" instead of that complex statement above.

New patch coming.

Ronald