[FFmpeg-devel] [PATCH] MLP/TrueHD Decoder - 2nd try

Thu Jul 3 05:30:15 CEST 2008

On Thu, Jul 03, 2008 at 01:47:11AM +0100, Ramiro Polla wrote:
> Michael Niedermayer wrote:
>> On Tue, Jul 01, 2008 at 06:24:55PM +0100, Ramiro Polla wrote:
>>> Michael Niedermayer wrote:
>>>> On Tue, Jul 01, 2008 at 01:55:39AM +0100, Ramiro Polla wrote:
>> [...]
>>>> [...]
>>>>>> [...]
>>>>>>> /** Generate a PCM sample using the prediction filters and a residual 
>>>>>>> value
>>>>>>>  *  read from the data stream, and update the filter state.
>>>>>>>  */
>>>>>>>
>>>>>>> static int filter_sample(MLPDecodeContext *m, unsigned int substr,
>>>>>>>                          unsigned int channel, int32_t residual)
>>>>>>> {
>>>>>>>     unsigned int i, j, index;
>>>>>>>     int64_t accum = 0;
>>>>>>>     int32_t result;
>>>>>>>
>>>>>>>     /* TODO: Move this code to DSPContext? */
>>>>>>>
>>>>>>> #define INDEX(channel, order, pos) \
>>>>>>>     ((m->filter_index[channel][order] + (pos)) & (MAX_FILTER_ORDER - 
>>>>>>> 1))
>>>>>>>
>>>>>>>     for (j = 0; j < 2; j++)
>>>>>>>         for (i = 0; i < m->filter_order[channel][j]; i++)
>>>>>>>             accum += 
>>>>>>> (int64_t)m->filter_state[channel][j][INDEX(channel,j,i)] *
>>>>>>>                      m->filter_coeff[channel][j][i];
>>>>>>>
>>>>>>>     accum = accum >> m->filter_coeff_q[channel][FIR];
>>>>>>>     result = (accum + residual)
>>>>>>>                 & ~((1 << m->quant_step_size[substr][channel]) - 1);
>>>>>>>
>>>>>>>     index = INDEX(channel, FIR, -1);
>>>>>>>     m->filter_state[channel][FIR][index] = result;
>>>>>>>     m->filter_index[channel][FIR] = index;
>>>>>>>
>>>>>>>     index = INDEX(channel, IIR, -1);
>>>>>>>     m->filter_state[channel][IIR][index] = result - accum;
>>>>>>>     m->filter_index[channel][IIR] = index;
>>>>>>>
>>>>>>> #undef INDEX
>>>>>>>
>>>>>>>     return result;
>>>>>>> }
>>>>>> Why the complicated indexing?
>>>>>> how big would filter_state be if it where a normal flat array? 160 
>>>>>> entries?
>>>>> It would need
>>>>> sizeof(int32_t) * MAX_BLOCKSIZE * 2 * MAX_CHANNELS = 20480 = 20k
>>>>> bytes more in the context, and attached filter_index_flat_array.diff 
>>>>> patch. I assume that's not ok =)
>>>> yes, but do we really need that amount?
>>>> if the huff decode and filter loops are split (i assume thats possible) 
>>>> then
>>>> we can do a
>>>> for(channels)
>>>>     for(sample in block)
>>>> which would only need sizeof(int32_t) * 2 * MAX_BLOCKSIZE
>>>> and a memcpy() or 2 for a reading writing the state
>>> Done this way and got some speedup. But for now I leave 
>>> filter_state_buffer in the context. Should I move it to the stack as in 
>>> attached filter_state_buffer_stack.diff patch?
>> yes
>
> Applied.
>
>> [...]
>>> /** Maximum number of substreams that can be decoded. This could also be 
>>> set
>>>  *  higher, but again I haven't seen any examples with more than two. */
>>> #define MAX_SUBSTREAMS      2
>>>
>>> /** Maximum sample frequency supported. */
>>> #define MAX_SAMPLERATE      192000
>> slightly unclear, supported by our implementation or MLP?
>
> According to the coded values, MLP should support up to 48000 << 7. But 
> then again, I didn't do the RE, and all the samples I've seen don't go 
> higher than 192000.
>
> Should there be a comment stating this code does not follow specs and is 
> based on RE and so some values may be off?

"Maximum sample frequency seen in files"

[...]
>> [...]
>>> /** Initialize static data, constant between all invocations of the 
>>> codec. */
>>>
>>> static void init_static()
>> av_cold
>
> Changed. And should all init and close functions be av_cold as well? 

yes

> I 
> think this has not been enforced on some new code.

:(

>
>>> {
>>>     if (!huff_vlc[0].bits) {
>> checking the last set element avoids race conditions (its not a real 
>> issues
>> as avcodec_open() isnt thread safe)
>
> So, can I leave it as is? And I remember you telling Robert that those 
> checks could be suppressed for the VLC tables in his AAC code. Does this 
> also apply here?

probably

>
>> [...]
>>> /** Read a restart header from a block in a substream. This contains 
>>> parameters
>>>  *  required to decode the audio that do not change very often. Generally
>>>  *  (always) present only in blocks following a major sync.
>>>  */
>>>
>>> static int read_restart_header(MLPDecodeContext *m, GetBitContext *gbp,
>>>                                const uint8_t *buf, unsigned int substr)
>>> {
>>>     SubStream *s = &m->substream[substr];
>>>     unsigned int ch;
>>>     int sync_word, tmp;
>>>     uint8_t checksum;
>>>     uint8_t lossless_check;
>>>     int start_count = get_bits_count(gbp);
>>>
>>>     sync_word = get_bits(gbp, 14);
>>>
>>>     if ((sync_word & 0x3ffe) != 0x31ea) {
>> sync_word = get_bits(gbp, 13);
>> if(sync_word != ...
>
> sync_word is later used in the code to check if it's either 0x31ea or 
> 0x31eb.

yes, these would then be 1 vs 0 checks of the 1 bit read after the sync,
which IMHO is cleaner.

>
>> [...]
>>>     for (ch = s->min_channel; ch <= s->max_channel; ch++) {
>>>         m->filter_order  [ch][FIR] = 0;
>>>         m->filter_order  [ch][IIR] = 0;
>>>         m->filter_coeff_q[ch][FIR] = 0;
>>>         m->filter_coeff_q[ch][IIR] = 0;
>>>
>>>         memset(m->filter_coeff[ch], 0, sizeof(m->filter_coeff[ch]));
>>>         memset(m->filter_state[ch], 0, sizeof(m->filter_state[ch]));
>> If order (= number of taps) is 0 then the coeffs should not matter
>
> And if it's not zero they're read from the bitstream, so these memsets are 
> pointless. Removed.
>
>> [...]
>>>     if (order > 0) {
>>>         int coeff_bits, coeff_shift;
>>>
>>>         m->filter_coeff_q[channel][filter] = get_bits(gbp, 4);
>>>
>>>         coeff_bits  = get_bits(gbp, 5);
>>>         coeff_shift = get_bits(gbp, 3);
>>>         if (coeff_bits < 1 || coeff_bits > 16) {
>>>             av_log(m->avctx, AV_LOG_ERROR,
>>>                    "%cIR filter coeff_bits must be between 1 and 16\n",
>>>                    fchar);
>>>             return -1;
>>>         }
>>>         if (coeff_bits + coeff_shift > 16) {
>>>             av_log(m->avctx, AV_LOG_ERROR,
>>>                    "Sum of coeff_bits and coeff_shift for %cIR filter 
>>> must be 16 or less\n",
>>>                    fchar);
>>>             return -1;
>>>         }
>>>
>>>         for (i = 0; i < order; i++)
>>>             m->filter_coeff[channel][filter][i] =
>>>                     get_sbits(gbp, coeff_bits) << coeff_shift;
>> Cant coeff_shift just be subtracted from filter_coeff_q ?
>
> filter_coeff_q is the same for both filters. coeff_shift can differ.
>
> Besides wouldn't this just change a shift for a minus while reading the 
> data and leave all the rest of the muls and shifts the same?

hmm, forget my suggestion it was silly ...

[...]
> static int mlp_decode_init(AVCodecContext *avctx)

av_cold

[...]
>     for (i = 0; i < s->blocksize; i++) {
>         int32_t residual = m->sample_buffer[i + s->blockpos][channel];
>         unsigned int order;
>         int64_t accum = 0;
>         int32_t result;
> 
>         /* TODO: Move this code to DSPContext? */
> 
>         for (j = 0; j < NUM_FILTERS; j++)
>             for (order = 0; order < m->filter_order[channel][j]; order++)
>                 accum += (int64_t)filter_state_buffer[j][index + order] *
>                         m->filter_coeff[channel][j][order];
> 
>         accum  = accum >> filter_coeff_q;
>         result = (accum + residual) & mask;
> 

>         --index;
> 
>         filter_state_buffer[FIR][index] = result;
>         filter_state_buffer[IIR][index] = result - accum;
> 
>         m->sample_buffer[i + s->blockpos][channel] = result;
>     }

is there any reason for -- instead of using i ?

[...]
> /** Generate two channels of noise, used in the matrix when
>  *  restart_sync_word == 0x31ea. */
> 
> static void generate_noise_1(MLPDecodeContext *m, unsigned int substr)
> {
>     SubStream *s = &m->substream[substr];
>     unsigned int i;
>     uint32_t seed = s->noisegen_seed;
>     unsigned int maxchan = s->max_matrix_channel;
> 
>     for (i = 0; i < s->blockpos; i++) {
>         uint16_t seed_shr7 = seed >> 7;
>         m->sample_buffer[i][maxchan+1] = ((int8_t)(seed >> 15)) << s->noise_shift;
>         m->sample_buffer[i][maxchan+2] = ((int8_t) seed_shr7)   << s->noise_shift;
> 
>         seed = (seed << 16) ^ seed_shr7 ^ (seed_shr7 << 5);
>     }
> 
>     s->noisegen_seed = seed;
> }
> 
> /** Generate a block of noise, used when restart_sync_word == 0x31eb. */
> 
> static void generate_noise_2(MLPDecodeContext *m, unsigned int substr)
> {

the 2 functions should be named more correctly like
generate_2_noise_channels()
fill_noise_buffer()

[...]
> /** Read an access unit from the stream.
>  *  Returns -1 on error, 0 if not enough data is present in the input stream
>  *  otherwise returns the number of bytes consumed. */

i think "returns <0 on error" is better

[...]
>     rematrix_channels(m, substr - 1);
> 
>     if (output_data(m, substr - 1, data, data_size) < 0)
>         return -1;

shouldnt these rather be max_decoded_substream? I mean substr depends on the
loop above ending with substr-1 = max_decoded_substream

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080703/47c64c5d/attachment.pgp>