[FFmpeg-devel] [PATCH] wmapro decoder
Sascha Sommer
saschasommer
Sun Aug 2 15:11:50 CEST 2009
Hi,
> > + * The compressed wmapro bitstream is split into individual packets.
> > + * Every such packet contains one or more wma frames.
> > + * The compressed frames may have a variable length and frames may
> > + * cross packet boundaries.
> > + * Common to all wmapro frames is the number of samples that are stored
> > in + * a frame.
> > + * The number of samples and a few other decode flags are stored
> > + * as extradata that has to be passed to the decoder.
> > + *
> > + * The wmapro frames themselves are again split into a variable number
> > of + * subframes. Every subframe contains the data for 2^N time domain
> > samples + * where N varies between 7 and 12.
> > + *
> > + * The frame layouts for the individual channels of a wma frame does not
> > need + * to be the same.
>
> maybe you want to draw some simple ascii art to show these things ...
>
Done.
> > +/**
> > + * @brief decoder context for a single channel
> > + */
> > +typedef struct {
> > + int16_t prev_block_len; ///< length of the
> > previous block + uint8_t transmit_coefs;
> > + uint8_t num_subframes;
> > + uint16_t subframe_len[MAX_SUBFRAMES]; ///< subframe
> > length in samples
> >
> > + uint16_t subframe_offset[MAX_SUBFRAMES]; ///< subframe
> > position
>
> a little unclear, position in the stream vs. frame
>
> > + uint8_t cur_subframe; ///< subframe
> > index
>
> same and it applies to more fields as well i think
>
Fixed.
> > + int max_scale_factor; ///< maximum scale
> > factor + int scale_factors[MAX_BANDS]; ///< scale
> > factor values
>
> doxy seem redundant, they just repeat the var names ...
>
Fixed.
> > + int resampled_scale_factors[MAX_BANDS]; ///< scale factors
> > from a previous block + int16_t scale_factor_block_len;
> > ///< scale factor reference block length + float* coeffs;
> > ///< pointer to the decode buffer +
> > DECLARE_ALIGNED_16(float, out[2*WMAPRO_BLOCK_MAX_SIZE]); ///< output
> > buffer +} WMA3ChannelCtx;
> > +
> > +/**
> > + * @brief channel group for channel transformations
> > + */
> > +typedef struct {
> > + uint8_t num_channels; ///<
> > number of channels in the group + int8_t transform;
> > ///< controls the type of the transform + int8_t
> > transform_band[MAX_BANDS]; ///< controls if the
> > transform is enabled for a certain band
> >
> > + float
> > decorrelation_matrix[WMAPRO_MAX_CHANNELS*WMAPRO_MAX_CHANNELS];
>
> is it simpler if its [WMAPRO_MAX_CHANNELS][WMAPRO_MAX_CHANNELS] ?
>
I don't think so. At the moment the values are stored in the order they are
used. If it were a 2d array, there would be gaps that need to be handled in
various places.
> > +/**
> > + *@brief helper function to print the most important members of the
> > context + *@param s context
> > + */
> > +static void av_cold dump_context(WMA3DecodeContext *s)
> > +{
> >
> > +#define PRINT(a,b) av_log(s->avctx,AV_LOG_DEBUG," %s = %d\n", a, b);
> > +#define PRINT_HEX(a,b) av_log(s->avctx,AV_LOG_DEBUG," %s = %x\n", a, b);
>
> vertical align
>
Fixed.
> > +static av_cold int decode_end(AVCodecContext *avctx)
[...]
> function ok, can be commited if you like
>
Committed.
> > +
> > +/**
> > + *@brief Initialize the decoder.
> > + *@param avctx codec context
> > + *@return 0 on success, -1 otherwise
> > + */
> > +static av_cold int decode_init(AVCodecContext *avctx)
> > +{
> > + WMA3DecodeContext *s = avctx->priv_data;
> > + uint8_t *edata_ptr = avctx->extradata;
> > + int16_t* sfb_offsets;
> > + unsigned int channel_mask;
> > + int i;
> > + int log2_num_subframes;
> > +
> > + s->avctx = avctx;
> > + dsputil_init(&s->dsp, avctx);
> > +
> > + avctx->sample_fmt = SAMPLE_FMT_FLT;
> > +
> > + if (avctx->extradata_size >= 18) {
> >
> > + s->decode_flags = AV_RL16(edata_ptr+14);
> > + channel_mask = AV_RL32(edata_ptr+2);
> > + s->bits_per_sample = AV_RL16(edata_ptr);
>
> vertical align
>
Fixed.
> > + s->len_prefix = (s->decode_flags & 0x40) >> 6;
>
> the >> is uneeded
>
Removed.
> > +
> > + if (!s->len_prefix) {
> > + av_log_ask_for_sample(avctx, "no length prefix\n");
> > + return AVERROR_INVALIDDATA;
> > + }
> > +
> > + /** get frame len */
> > + s->samples_per_frame = 1 <<
> > ff_wma_get_frame_len_bits(avctx->sample_rate, +
> > 3, s->decode_flags); +
> > + /** init previous block len */
> > + for (i=0;i<avctx->channels;i++)
> > + s->channel[i].prev_block_len = s->samples_per_frame;
> > +
> > + /** subframe info */
> > + log2_num_subframes = ((s->decode_flags & 0x38) >> 3);
> > + s->max_num_subframes = 1 << log2_num_subframes;
> > + s->num_possible_block_sizes = log2_num_subframes + 1;
> > + s->min_samples_per_subframe = s->samples_per_frame /
> > s->max_num_subframes; + s->dynamic_range_compression =
> > (s->decode_flags & 0x80);
>
> these assignments also could be aligned so that = is at the same pos, i
> think this would be more readable ....
>
Fixed.
> > + for (i=0;i<s->num_possible_block_sizes;i++) {
> > + int subframe_len = s->samples_per_frame >> i;
> > + int x;
> > + int band = 1;
> > +
> > + sfb_offsets[0] = 0;
> > +
> > + for (x=0;x < MAX_BANDS-1 && sfb_offsets[band-1] <
> > subframe_len;x++) { + int offset = (subframe_len * 2 *
> > critical_freq[x])
> > + / s->avctx->sample_rate + 2;
> > + offset &= ~3;
> > + if ( offset > sfb_offsets[band - 1] )
> > + sfb_offsets[band++] = offset;
> > + }
> > + sfb_offsets[band - 1] = subframe_len;
> > + s->num_sfb[i] = band - 1;
> > + sfb_offsets += MAX_BANDS;
> > + }
>
> looking at the MAX_BANDS i was wonderin if the thing might be better as a
> 2d array?
>
Yes. Also sf_offsets can be a multidimensional array.
Changed.
> > +
> > +
> > + /** Scale factors can be shared between blocks of different size
> > + as every block has a different scale factor band layout.
> > + The matrix sf_offsets is needed to find the correct scale
> > factor. + */
> > +
> > + for (i=0;i<s->num_possible_block_sizes;i++) {
> > + int b;
> >
> > + for (b=0;b< s->num_sfb[i];b++) {
>
> the whitespace seems inconsistent around < here
>
Fixed.
> >
> > + dump_context(s);
>
> this should be under some FF_DEBUG_ if() like in the video decoders
> otherwise it can become pretty hard to read the result as there are
> too many av_logs being printed ...
>
Fixed.
> > + * If the subframes are not evenly split, the algorithm estimates
> > the + * channels with the lowest number of total samples.
> > + * Afterwards, for each of these channels a bit is read from the
> >
> > + * bitstream that indicates if the channel contains a frame with
> > the
>
> frame or subframe ?
>
Subframe. Fixed.
> > + } else if (s->max_num_subframes == 16) {
> > + subframe_len_zero_bit = 1;
> > + subframe_len_bits = 3;
> > + } else
> > + subframe_len_bits = av_log2(av_log2(s->max_num_subframes)) +
> > 1;
>
> the special casing of subframe_len_bits seems unneeded
>
Fixed.
> > +static void decode_decorrelation_matrix(WMA3DecodeContext* s,
> > + WMA3ChannelGroup* chgroup)
>
> i wonder if decorrelation_matrix should be a 2d array besides this, this
> function is ok
>
Committed. See above regarding the 2d array.
>
> [...]
>
> > + /** decode transform type */
> > + if (chgroup->num_channels == 2) {
> > + if (get_bits1(&s->gb)) {
> > + if (get_bits1(&s->gb)) {
> > + av_log_ask_for_sample(s->avctx,
> > + "unsupported channel transform type\n");
> > + }
> > + } else {
> >
> > + if (s->num_channels == 2) {
> > + chgroup->transform = 1;
> > + } else {
> > + chgroup->transform = 2;
> > + /** cos(pi/4) */
> > + chgroup->decorrelation_matrix[0] = 0.70703125;
> > + chgroup->decorrelation_matrix[1] = -0.70703125;
> > + chgroup->decorrelation_matrix[2] = 0.70703125;
> > + chgroup->decorrelation_matrix[3] = 0.70703125;
> > + }
>
> why the special handling of 2 vs. >2 channels here?
>
When the stream has only 2 channels, the channels are M/S stereo coded
(transform 1)
When the stream has more than 2 channels, the matrix multiplication is used.
for the 2 channels that contain data for the current subframe length/offset.
(num_channels in the channel group != num_channels in the stream)
> > +
> > + /** decode vector coefficients (consumes up to 167 bits per
> > iteration for + 4 vector coded large values) */
> > + while (!rl_mode && cur_coeff + 3 < s->subframe_len) {
> > + int vals[4];
> > + int i;
> > + unsigned int idx;
> > +
> > + idx = get_vlc2(&s->gb, vec4_vlc.table, VLCBITS, VEC4MAXDEPTH);
> > +
> > + if ( idx == HUFF_VEC4_SIZE - 1 ) {
> > + for (i=0 ; i < 4 ; i+= 2) {
> > + idx = get_vlc2(&s->gb, vec2_vlc.table, VLCBITS,
> > VEC2MAXDEPTH); + if ( idx == HUFF_VEC2_SIZE - 1 ) {
> > + vals[i] = get_vlc2(&s->gb, vec1_vlc.table, VLCBITS,
> > VEC1MAXDEPTH); + if (vals[i] == HUFF_VEC1_SIZE - 1)
> > + vals[i] += ff_wma_get_large_val(&s->gb);
> > + vals[i+1] = get_vlc2(&s->gb, vec1_vlc.table,
> > VLCBITS, VEC1MAXDEPTH); + if (vals[i+1] ==
> > HUFF_VEC1_SIZE - 1)
> > + vals[i+1] += ff_wma_get_large_val(&s->gb);
> > + } else {
> >
> > + vals[i] = (symbol_to_vec2[idx] >> 4) & 0xF;
>
> does this really need the & F ?
>
It is no longer needed. Removed.
> > + vals[i+1] = symbol_to_vec2[idx] & 0xF;
> > + }
> > + }
> > + } else {
> >
> > + vals[0] = (symbol_to_vec4[idx] >> 8) >> 4;
>
> can be simplified
>
Done.
> > + if ( !idx ) {
> >
> > + uint32_t code = get_bits(&s->gb,14);
> > + val = code >> 6;
> > + sign = (code & 1) - 1;
> > + skip = (code & 0x3f)>>1;
>
> any reason why that are not 3 get_bits() ?
It is faster this way.
>
> > + } else if (idx == 1) {
> > + break;
> > + } else {
> > + skip = scale_rl_run[idx];
> > + val = scale_rl_level[idx];
> > + sign = get_bits1(&s->gb)-1;
> > + }
> > +
> > + i += skip;
> > + if (i >= s->num_bands) {
> > + av_log(s->avctx,AV_LOG_ERROR,
> > + "invalid scale factor coding\n");
> > + return AVERROR_INVALIDDATA;
> > + } else
> > + s->channel[c].scale_factors[i] += (val ^ sign) -
> > sign; + }
> > + }
> > +
> > + s->channel[c].reuse_sf = 1;
> >
> > + s->channel[c].max_scale_factor =
> > s->channel[c].scale_factors[0]; + for
> > (sf=s->channel[c].scale_factors + 1; sf < sf_end; sf++) { +
> > if (s->channel[c].max_scale_factor < *sf)
> > + s->channel[c].max_scale_factor = *sf;
> > + }
>
> seems duplicated
>
I reworked the handling of the resampled scale factors.
> > +static void inverse_channel_transform(WMA3DecodeContext *s)
[...]
> function ok
Committed.
> > +static void window(WMA3DecodeContext *s)
>
> the function name is not good
>
Changed. Not sure if the new one is better...
> > + /** calculate number of scale factor bands and their offsets */
> > + if (subframe_len == s->samples_per_frame) {
> > + s->num_bands = s->num_sfb[0];
> > + s->cur_sfb_offsets = s->sfb_offsets;
> > + s->cur_subwoofer_cutoff = s->subwoofer_cutoffs[0];
> > + } else {
> > + int frame_offset = av_log2(s->samples_per_frame/subframe_len);
> > + s->num_bands = s->num_sfb[frame_offset];
> > + s->cur_sfb_offsets = &s->sfb_offsets[MAX_BANDS * frame_offset];
> > + s->cur_subwoofer_cutoff = s->subwoofer_cutoffs[frame_offset];
> > + }
>
> redundant special casing
>
Removed.
>
> [...]
>
> > + if (transmit_coeffs) {
> > + /** reconstruct the per channel data */
> > + inverse_channel_transform(s);
> > + for (i=0;i<s->channels_for_cur_subframe;i++) {
> > + int c = s->channel_indexes_for_cur_subframe[i];
> > + int* sf;
> > + int b;
> > +
> >
> > + if (s->channel[c].transmit_sf) {
> > + sf = s->channel[c].scale_factors;
> > + } else
> > + sf = s->channel[c].resampled_scale_factors;
>
> cant they be put in the same array ? (just asking ...)
>
Done. The array that contains the saved factors is now only used in
decode_scale_factors. This should improve understandability.
Regards
Sascha
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wmapro_try4.patch
Type: text/x-diff
Size: 61932 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090802/6485f776/attachment.patch>
More information about the ffmpeg-devel
mailing list