[FFmpeg-devel] [PATCH] wmapro decoder

Sun Aug 2 15:11:50 CEST 2009

Hi,

> > + * The compressed wmapro bitstream is split into individual packets.
> > + * Every such packet contains one or more wma frames.
> > + * The compressed frames may have a variable length and frames may
> > + * cross packet boundaries.
> > + * Common to all wmapro frames is the number of samples that are stored
> > in + * a frame.
> > + * The number of samples and a few other decode flags are stored
> > + * as extradata that has to be passed to the decoder.
> > + *
> > + * The wmapro frames themselves are again split into a variable number
> > of + * subframes. Every subframe contains the data for 2^N time domain
> > samples + * where N varies between 7 and 12.
> > + *
> > + * The frame layouts for the individual channels of a wma frame does not
> > need + * to be the same.
>
> maybe you want to draw some simple ascii art to show these things ...
>

Done.

> > +/**
> > + * @brief decoder context for a single channel
> > + */
> > +typedef struct {
> > +    int16_t  prev_block_len;                          ///< length of the
> > previous block +    uint8_t  transmit_coefs;
> > +    uint8_t  num_subframes;
> > +    uint16_t subframe_len[MAX_SUBFRAMES];             ///< subframe
> > length in samples
> >
> > +    uint16_t subframe_offset[MAX_SUBFRAMES];          ///< subframe
> > position
>
> a little unclear, position in the stream vs. frame
>
> > +    uint8_t  cur_subframe;                            ///< subframe
> > index
>
> same and it applies to more fields as well i think
>

Fixed.

> > +    int      max_scale_factor;                        ///< maximum scale
> > factor +    int      scale_factors[MAX_BANDS];                ///< scale
> > factor values
>
> doxy seem redundant, they just repeat the var names ...
>

Fixed.

> > +    int      resampled_scale_factors[MAX_BANDS];      ///< scale factors
> > from a previous block +    int16_t  scale_factor_block_len;              
> >    ///< scale factor reference block length +    float*   coeffs;        
> >                          ///< pointer to the decode buffer +   
> > DECLARE_ALIGNED_16(float, out[2*WMAPRO_BLOCK_MAX_SIZE]); ///< output
> > buffer +} WMA3ChannelCtx;
> > +
> > +/**
> > + * @brief channel group for channel transformations
> > + */
> > +typedef struct {
> > +    uint8_t num_channels;                                     ///<
> > number of channels in the group +    int8_t  transform;                  
> >                      ///< controls the type of the transform +    int8_t 
> > transform_band[MAX_BANDS];                        ///< controls if the
> > transform is enabled for a certain band
> >
> > +    float  
> > decorrelation_matrix[WMAPRO_MAX_CHANNELS*WMAPRO_MAX_CHANNELS];
>
> is it simpler if its [WMAPRO_MAX_CHANNELS][WMAPRO_MAX_CHANNELS] ?
>

I don't think so. At the moment the values are stored in the order they are 
used. If it were a 2d array, there would be gaps that need to be handled in 
various places.

> > +/**
> > + *@brief helper function to print the most important members of the
> > context + *@param s context
> > + */
> > +static void av_cold dump_context(WMA3DecodeContext *s)
> > +{
> >
> > +#define PRINT(a,b) av_log(s->avctx,AV_LOG_DEBUG," %s = %d\n", a, b);
> > +#define PRINT_HEX(a,b) av_log(s->avctx,AV_LOG_DEBUG," %s = %x\n", a, b);
>
> vertical align
>

Fixed.

> > +static av_cold int decode_end(AVCodecContext *avctx)

[...]

> function ok, can be commited if you like
>

Committed.

> > +
> > +/**
> > + *@brief Initialize the decoder.
> > + *@param avctx codec context
> > + *@return 0 on success, -1 otherwise
> > + */
> > +static av_cold int decode_init(AVCodecContext *avctx)
> > +{
> > +    WMA3DecodeContext *s = avctx->priv_data;
> > +    uint8_t *edata_ptr = avctx->extradata;
> > +    int16_t* sfb_offsets;
> > +    unsigned int channel_mask;
> > +    int i;
> > +    int log2_num_subframes;
> > +
> > +    s->avctx = avctx;
> > +    dsputil_init(&s->dsp, avctx);
> > +
> > +    avctx->sample_fmt = SAMPLE_FMT_FLT;
> > +
> > +    if (avctx->extradata_size >= 18) {
> >
> > +        s->decode_flags     = AV_RL16(edata_ptr+14);
> > +        channel_mask    = AV_RL32(edata_ptr+2);
> > +        s->bits_per_sample = AV_RL16(edata_ptr);
>
> vertical align
>

Fixed.

> > +    s->len_prefix = (s->decode_flags & 0x40) >> 6;
>
> the >> is uneeded
>

Removed.

> > +
> > +    if (!s->len_prefix) {
> > +         av_log_ask_for_sample(avctx, "no length prefix\n");
> > +         return AVERROR_INVALIDDATA;
> > +    }
> > +
> > +    /** get frame len */
> > +    s->samples_per_frame = 1 <<
> > ff_wma_get_frame_len_bits(avctx->sample_rate, +                          
> >                                3, s->decode_flags); +
> > +    /** init previous block len */
> > +    for (i=0;i<avctx->channels;i++)
> > +        s->channel[i].prev_block_len = s->samples_per_frame;
> > +
> > +    /** subframe info */
> > +    log2_num_subframes = ((s->decode_flags & 0x38) >> 3);
> > +    s->max_num_subframes = 1 << log2_num_subframes;
> > +    s->num_possible_block_sizes = log2_num_subframes + 1;
> > +    s->min_samples_per_subframe = s->samples_per_frame /
> > s->max_num_subframes; +    s->dynamic_range_compression =
> > (s->decode_flags & 0x80);
>
> these assignments also could be aligned so that = is at the same pos, i
> think this would be more readable ....
>

Fixed.

> > +    for (i=0;i<s->num_possible_block_sizes;i++) {
> > +        int subframe_len = s->samples_per_frame >> i;
> > +        int x;
> > +        int band = 1;
> > +
> > +        sfb_offsets[0] = 0;
> > +
> > +        for (x=0;x < MAX_BANDS-1 && sfb_offsets[band-1] <
> > subframe_len;x++) { +            int offset = (subframe_len * 2 *
> > critical_freq[x])
> > +                          / s->avctx->sample_rate + 2;
> > +            offset &= ~3;
> > +            if ( offset > sfb_offsets[band - 1] )
> > +                sfb_offsets[band++] = offset;
> > +        }
> > +        sfb_offsets[band - 1] = subframe_len;
> > +        s->num_sfb[i] = band - 1;
> > +        sfb_offsets += MAX_BANDS;
> > +    }
>
> looking at the MAX_BANDS i was wonderin if the thing might be better as a
> 2d array?
>

Yes. Also sf_offsets can be a multidimensional array.
Changed.

> > +
> > +
> > +    /** Scale factors can be shared between blocks of different size
> > +        as every block has a different scale factor band layout.
> > +        The matrix sf_offsets is needed to find the correct scale
> > factor. +     */
> > +
> > +    for (i=0;i<s->num_possible_block_sizes;i++) {
> > +        int b;
> >
> > +        for (b=0;b< s->num_sfb[i];b++) {
>
> the whitespace seems inconsistent around < here
>

Fixed.

> >
> > +    dump_context(s);
>
> this should be under some FF_DEBUG_ if() like in the video decoders
> otherwise it can become pretty hard to read the result as there are
> too many av_logs being printed ...
>

Fixed.

> > + *       If the subframes are not evenly split, the algorithm estimates
> > the + *       channels with the lowest number of total samples.
> > + *       Afterwards, for each of these channels a bit is read from the
> >
> > + *       bitstream that indicates if the channel contains a frame with
> > the
>
> frame or subframe ?
>

Subframe. Fixed.

> > +        } else if (s->max_num_subframes == 16) {
> > +            subframe_len_zero_bit = 1;
> > +            subframe_len_bits = 3;
> > +        } else
> > +            subframe_len_bits = av_log2(av_log2(s->max_num_subframes)) +
> > 1;
>
> the special casing of subframe_len_bits seems unneeded
>

Fixed.

> > +static void decode_decorrelation_matrix(WMA3DecodeContext* s,
> > +                                            WMA3ChannelGroup* chgroup)
>
> i wonder if decorrelation_matrix should be a 2d array besides this, this
> function is ok
>

Committed. See above regarding the 2d array.

>
> [...]
>
> > +            /** decode transform type */
> > +            if (chgroup->num_channels == 2) {
> > +                if (get_bits1(&s->gb)) {
> > +                    if (get_bits1(&s->gb)) {
> > +                        av_log_ask_for_sample(s->avctx,
> > +                               "unsupported channel transform type\n");
> > +                    }
> > +                } else {
> >
> > +                    if (s->num_channels == 2) {
> > +                        chgroup->transform = 1;
> > +                    } else {
> > +                        chgroup->transform = 2;
> > +                        /** cos(pi/4) */
> > +                        chgroup->decorrelation_matrix[0] = 0.70703125;
> > +                        chgroup->decorrelation_matrix[1] = -0.70703125;
> > +                        chgroup->decorrelation_matrix[2] = 0.70703125;
> > +                        chgroup->decorrelation_matrix[3] = 0.70703125;
> > +                    }
>
> why the special handling of 2 vs. >2 channels here?
>

When the stream has only 2 channels, the channels are M/S stereo coded 
(transform 1)
When the stream has more than 2 channels, the matrix multiplication is used.
for the 2 channels that contain data for the current subframe length/offset.
(num_channels in the channel group != num_channels in the stream)

> > +
> > +    /** decode vector coefficients (consumes up to 167 bits per
> > iteration for +      4 vector coded large values) */
> > +    while (!rl_mode && cur_coeff + 3 < s->subframe_len) {
> > +        int vals[4];
> > +        int i;
> > +        unsigned int idx;
> > +
> > +        idx = get_vlc2(&s->gb, vec4_vlc.table, VLCBITS, VEC4MAXDEPTH);
> > +
> > +        if ( idx == HUFF_VEC4_SIZE - 1 ) {
> > +            for (i=0 ; i < 4 ; i+= 2) {
> > +                idx = get_vlc2(&s->gb, vec2_vlc.table, VLCBITS,
> > VEC2MAXDEPTH); +                if ( idx == HUFF_VEC2_SIZE - 1 ) {
> > +                    vals[i] = get_vlc2(&s->gb, vec1_vlc.table, VLCBITS,
> > VEC1MAXDEPTH); +                    if (vals[i] == HUFF_VEC1_SIZE - 1)
> > +                        vals[i] += ff_wma_get_large_val(&s->gb);
> > +                    vals[i+1] = get_vlc2(&s->gb, vec1_vlc.table,
> > VLCBITS, VEC1MAXDEPTH); +                    if (vals[i+1] ==
> > HUFF_VEC1_SIZE - 1)
> > +                        vals[i+1] += ff_wma_get_large_val(&s->gb);
> > +                } else {
> >
> > +                    vals[i] = (symbol_to_vec2[idx] >> 4) & 0xF;
>
> does this really need the & F ?
>

It is no longer needed. Removed.

> > +                    vals[i+1] = symbol_to_vec2[idx] & 0xF;
> > +                }
> > +            }
> > +        } else {
> >
> > +             vals[0] = (symbol_to_vec4[idx] >> 8) >> 4;
>
> can be simplified
>

Done.

> > +                    if ( !idx ) {
> >
> > +                        uint32_t code = get_bits(&s->gb,14);
> > +                        val = code >> 6;
> > +                        sign = (code & 1) - 1;
> > +                        skip = (code & 0x3f)>>1;
>
> any reason why that are not 3 get_bits() ?

It is faster this way.

>
> > +                    } else if (idx == 1) {
> > +                        break;
> > +                    } else {
> > +                        skip = scale_rl_run[idx];
> > +                        val = scale_rl_level[idx];
> > +                        sign = get_bits1(&s->gb)-1;
> > +                    }
> > +
> > +                    i += skip;
> > +                    if (i >= s->num_bands) {
> > +                        av_log(s->avctx,AV_LOG_ERROR,
> > +                               "invalid scale factor coding\n");
> > +                        return AVERROR_INVALIDDATA;
> > +                    } else
> > +                        s->channel[c].scale_factors[i] += (val ^ sign) -
> > sign; +                }
> > +            }
> > +
> > +            s->channel[c].reuse_sf = 1;
> >
> > +            s->channel[c].max_scale_factor =
> > s->channel[c].scale_factors[0]; +            for
> > (sf=s->channel[c].scale_factors + 1; sf < sf_end; sf++) { +              
> >  if (s->channel[c].max_scale_factor < *sf)
> > +                    s->channel[c].max_scale_factor = *sf;
> > +            }
>
> seems duplicated
>

I reworked the handling of the resampled scale factors.

> > +static void inverse_channel_transform(WMA3DecodeContext *s)

[...]

> function ok

Committed.

> > +static void window(WMA3DecodeContext *s)
>
> the function name is not good
>

Changed. Not sure if the new one is better...

> > +    /** calculate number of scale factor bands and their offsets */
> > +    if (subframe_len == s->samples_per_frame) {
> > +        s->num_bands = s->num_sfb[0];
> > +        s->cur_sfb_offsets = s->sfb_offsets;
> > +        s->cur_subwoofer_cutoff = s->subwoofer_cutoffs[0];
> > +    } else {
> > +        int frame_offset = av_log2(s->samples_per_frame/subframe_len);
> > +        s->num_bands = s->num_sfb[frame_offset];
> > +        s->cur_sfb_offsets = &s->sfb_offsets[MAX_BANDS * frame_offset];
> > +        s->cur_subwoofer_cutoff = s->subwoofer_cutoffs[frame_offset];
> > +    }
>
> redundant special casing
>

Removed.

>
> [...]
>
> > +    if (transmit_coeffs) {
> > +        /** reconstruct the per channel data */
> > +        inverse_channel_transform(s);
> > +        for (i=0;i<s->channels_for_cur_subframe;i++) {
> > +            int c = s->channel_indexes_for_cur_subframe[i];
> > +            int* sf;
> > +            int b;
> > +
> >
> > +            if (s->channel[c].transmit_sf) {
> > +                sf = s->channel[c].scale_factors;
> > +            } else
> > +                sf = s->channel[c].resampled_scale_factors;
>
> cant they be put in the same array ? (just asking ...)
>

Done. The array that contains the saved factors is now only used in 
decode_scale_factors. This should improve understandability.

Regards

Sascha

-------------- next part --------------
A non-text attachment was scrubbed...
Name: wmapro_try4.patch
Type: text/x-diff
Size: 61932 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090802/6485f776/attachment.patch>