[FFmpeg-devel] [PATCH] MLP/TrueHD Decoder - 2nd try
Michael Niedermayer
michaelni
Thu Jul 3 05:30:15 CEST 2008
On Thu, Jul 03, 2008 at 01:47:11AM +0100, Ramiro Polla wrote:
> Michael Niedermayer wrote:
>> On Tue, Jul 01, 2008 at 06:24:55PM +0100, Ramiro Polla wrote:
>>> Michael Niedermayer wrote:
>>>> On Tue, Jul 01, 2008 at 01:55:39AM +0100, Ramiro Polla wrote:
>> [...]
>>>> [...]
>>>>>> [...]
>>>>>>> /** Generate a PCM sample using the prediction filters and a residual
>>>>>>> value
>>>>>>> * read from the data stream, and update the filter state.
>>>>>>> */
>>>>>>>
>>>>>>> static int filter_sample(MLPDecodeContext *m, unsigned int substr,
>>>>>>> unsigned int channel, int32_t residual)
>>>>>>> {
>>>>>>> unsigned int i, j, index;
>>>>>>> int64_t accum = 0;
>>>>>>> int32_t result;
>>>>>>>
>>>>>>> /* TODO: Move this code to DSPContext? */
>>>>>>>
>>>>>>> #define INDEX(channel, order, pos) \
>>>>>>> ((m->filter_index[channel][order] + (pos)) & (MAX_FILTER_ORDER -
>>>>>>> 1))
>>>>>>>
>>>>>>> for (j = 0; j < 2; j++)
>>>>>>> for (i = 0; i < m->filter_order[channel][j]; i++)
>>>>>>> accum +=
>>>>>>> (int64_t)m->filter_state[channel][j][INDEX(channel,j,i)] *
>>>>>>> m->filter_coeff[channel][j][i];
>>>>>>>
>>>>>>> accum = accum >> m->filter_coeff_q[channel][FIR];
>>>>>>> result = (accum + residual)
>>>>>>> & ~((1 << m->quant_step_size[substr][channel]) - 1);
>>>>>>>
>>>>>>> index = INDEX(channel, FIR, -1);
>>>>>>> m->filter_state[channel][FIR][index] = result;
>>>>>>> m->filter_index[channel][FIR] = index;
>>>>>>>
>>>>>>> index = INDEX(channel, IIR, -1);
>>>>>>> m->filter_state[channel][IIR][index] = result - accum;
>>>>>>> m->filter_index[channel][IIR] = index;
>>>>>>>
>>>>>>> #undef INDEX
>>>>>>>
>>>>>>> return result;
>>>>>>> }
>>>>>> Why the complicated indexing?
>>>>>> how big would filter_state be if it where a normal flat array? 160
>>>>>> entries?
>>>>> It would need
>>>>> sizeof(int32_t) * MAX_BLOCKSIZE * 2 * MAX_CHANNELS = 20480 = 20k
>>>>> bytes more in the context, and attached filter_index_flat_array.diff
>>>>> patch. I assume that's not ok =)
>>>> yes, but do we really need that amount?
>>>> if the huff decode and filter loops are split (i assume thats possible)
>>>> then
>>>> we can do a
>>>> for(channels)
>>>> for(sample in block)
>>>> which would only need sizeof(int32_t) * 2 * MAX_BLOCKSIZE
>>>> and a memcpy() or 2 for a reading writing the state
>>> Done this way and got some speedup. But for now I leave
>>> filter_state_buffer in the context. Should I move it to the stack as in
>>> attached filter_state_buffer_stack.diff patch?
>> yes
>
> Applied.
>
>> [...]
>>> /** Maximum number of substreams that can be decoded. This could also be
>>> set
>>> * higher, but again I haven't seen any examples with more than two. */
>>> #define MAX_SUBSTREAMS 2
>>>
>>> /** Maximum sample frequency supported. */
>>> #define MAX_SAMPLERATE 192000
>> slightly unclear, supported by our implementation or MLP?
>
> According to the coded values, MLP should support up to 48000 << 7. But
> then again, I didn't do the RE, and all the samples I've seen don't go
> higher than 192000.
>
> Should there be a comment stating this code does not follow specs and is
> based on RE and so some values may be off?
"Maximum sample frequency seen in files"
[...]
>> [...]
>>> /** Initialize static data, constant between all invocations of the
>>> codec. */
>>>
>>> static void init_static()
>> av_cold
>
> Changed. And should all init and close functions be av_cold as well?
yes
> I
> think this has not been enforced on some new code.
:(
>
>>> {
>>> if (!huff_vlc[0].bits) {
>> checking the last set element avoids race conditions (its not a real
>> issues
>> as avcodec_open() isnt thread safe)
>
> So, can I leave it as is? And I remember you telling Robert that those
> checks could be suppressed for the VLC tables in his AAC code. Does this
> also apply here?
probably
>
>> [...]
>>> /** Read a restart header from a block in a substream. This contains
>>> parameters
>>> * required to decode the audio that do not change very often. Generally
>>> * (always) present only in blocks following a major sync.
>>> */
>>>
>>> static int read_restart_header(MLPDecodeContext *m, GetBitContext *gbp,
>>> const uint8_t *buf, unsigned int substr)
>>> {
>>> SubStream *s = &m->substream[substr];
>>> unsigned int ch;
>>> int sync_word, tmp;
>>> uint8_t checksum;
>>> uint8_t lossless_check;
>>> int start_count = get_bits_count(gbp);
>>>
>>> sync_word = get_bits(gbp, 14);
>>>
>>> if ((sync_word & 0x3ffe) != 0x31ea) {
>> sync_word = get_bits(gbp, 13);
>> if(sync_word != ...
>
> sync_word is later used in the code to check if it's either 0x31ea or
> 0x31eb.
yes, these would then be 1 vs 0 checks of the 1 bit read after the sync,
which IMHO is cleaner.
>
>> [...]
>>> for (ch = s->min_channel; ch <= s->max_channel; ch++) {
>>> m->filter_order [ch][FIR] = 0;
>>> m->filter_order [ch][IIR] = 0;
>>> m->filter_coeff_q[ch][FIR] = 0;
>>> m->filter_coeff_q[ch][IIR] = 0;
>>>
>>> memset(m->filter_coeff[ch], 0, sizeof(m->filter_coeff[ch]));
>>> memset(m->filter_state[ch], 0, sizeof(m->filter_state[ch]));
>> If order (= number of taps) is 0 then the coeffs should not matter
>
> And if it's not zero they're read from the bitstream, so these memsets are
> pointless. Removed.
>
>> [...]
>>> if (order > 0) {
>>> int coeff_bits, coeff_shift;
>>>
>>> m->filter_coeff_q[channel][filter] = get_bits(gbp, 4);
>>>
>>> coeff_bits = get_bits(gbp, 5);
>>> coeff_shift = get_bits(gbp, 3);
>>> if (coeff_bits < 1 || coeff_bits > 16) {
>>> av_log(m->avctx, AV_LOG_ERROR,
>>> "%cIR filter coeff_bits must be between 1 and 16\n",
>>> fchar);
>>> return -1;
>>> }
>>> if (coeff_bits + coeff_shift > 16) {
>>> av_log(m->avctx, AV_LOG_ERROR,
>>> "Sum of coeff_bits and coeff_shift for %cIR filter
>>> must be 16 or less\n",
>>> fchar);
>>> return -1;
>>> }
>>>
>>> for (i = 0; i < order; i++)
>>> m->filter_coeff[channel][filter][i] =
>>> get_sbits(gbp, coeff_bits) << coeff_shift;
>> Cant coeff_shift just be subtracted from filter_coeff_q ?
>
> filter_coeff_q is the same for both filters. coeff_shift can differ.
>
> Besides wouldn't this just change a shift for a minus while reading the
> data and leave all the rest of the muls and shifts the same?
hmm, forget my suggestion it was silly ...
[...]
> static int mlp_decode_init(AVCodecContext *avctx)
av_cold
[...]
> for (i = 0; i < s->blocksize; i++) {
> int32_t residual = m->sample_buffer[i + s->blockpos][channel];
> unsigned int order;
> int64_t accum = 0;
> int32_t result;
>
> /* TODO: Move this code to DSPContext? */
>
> for (j = 0; j < NUM_FILTERS; j++)
> for (order = 0; order < m->filter_order[channel][j]; order++)
> accum += (int64_t)filter_state_buffer[j][index + order] *
> m->filter_coeff[channel][j][order];
>
> accum = accum >> filter_coeff_q;
> result = (accum + residual) & mask;
>
> --index;
>
> filter_state_buffer[FIR][index] = result;
> filter_state_buffer[IIR][index] = result - accum;
>
> m->sample_buffer[i + s->blockpos][channel] = result;
> }
is there any reason for -- instead of using i ?
[...]
> /** Generate two channels of noise, used in the matrix when
> * restart_sync_word == 0x31ea. */
>
> static void generate_noise_1(MLPDecodeContext *m, unsigned int substr)
> {
> SubStream *s = &m->substream[substr];
> unsigned int i;
> uint32_t seed = s->noisegen_seed;
> unsigned int maxchan = s->max_matrix_channel;
>
> for (i = 0; i < s->blockpos; i++) {
> uint16_t seed_shr7 = seed >> 7;
> m->sample_buffer[i][maxchan+1] = ((int8_t)(seed >> 15)) << s->noise_shift;
> m->sample_buffer[i][maxchan+2] = ((int8_t) seed_shr7) << s->noise_shift;
>
> seed = (seed << 16) ^ seed_shr7 ^ (seed_shr7 << 5);
> }
>
> s->noisegen_seed = seed;
> }
>
> /** Generate a block of noise, used when restart_sync_word == 0x31eb. */
>
> static void generate_noise_2(MLPDecodeContext *m, unsigned int substr)
> {
the 2 functions should be named more correctly like
generate_2_noise_channels()
fill_noise_buffer()
[...]
> /** Read an access unit from the stream.
> * Returns -1 on error, 0 if not enough data is present in the input stream
> * otherwise returns the number of bytes consumed. */
i think "returns <0 on error" is better
[...]
> rematrix_channels(m, substr - 1);
>
> if (output_data(m, substr - 1, data, data_size) < 0)
> return -1;
shouldnt these rather be max_decoded_substream? I mean substr depends on the
loop above ending with substr-1 = max_decoded_substream
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080703/47c64c5d/attachment.pgp>
More information about the ffmpeg-devel
mailing list