[FFmpeg-devel] [PATCH] Split reading and decoding of blocks in ALS
Thilo Borgmann
thilo.borgmann
Mon Nov 23 23:30:19 CET 2009
Michael Niedermayer schrieb:
> On Mon, Nov 23, 2009 at 11:28:35AM +0100, Thilo Borgmann wrote:
>> Michael Niedermayer schrieb:
>>> On Sat, Nov 14, 2009 at 02:15:30PM +0100, Thilo Borgmann wrote:
>>>> Hi,
>>>>
>>>> in order to support multi-channel correlation, reading and decoding of a
>>>> block has to be separated. This patch introduces ALSBockData struct for
>>>> that purpose.
>>> [...]
>>>> +/** Decodes the block data for a constant block
>>>> + */
>>>> +static void decode_const_block_data(ALSDecContext *ctx, ALSBlockData *bd)
>>>> +{
>>>> + int smp;
>>>> +
>>>> // write raw samples into buffer
>>>> - for (k = 0; k < block_length; k++)
>>>> - raw_samples[k] = const_val;
>>>> + for (smp = 0; smp < bd->block_length; smp++)
>>>> + bd->raw_samples[smp] = bd->const_val;
>>>> }
>>> memset32(dst, val, len){
>>> }
>>> would possibly be faster due to not doing bd-> in the loop
>> Ok will do...
>>
>>
>>> [...]
>>>> @@ -553,115 +579,132 @@
>>>> }
>>>>
>>>> if (get_bits1(gb))
>>>> - *shift_lsbs = get_bits(gb, 4) + 1;
>>>> + bd->shift_lsbs = get_bits(gb, 4) + 1;
>>>>
>>>> - store_prev_samples = (*js_blocks && raw_other) || *shift_lsbs;
>>>> + bd->store_prev_samples = (bd->js_blocks && bd->raw_other) || bd->shift_lsbs;
>>>>
>>>> -
>>>> if (!sconf->rlslms) {
>>>> if (sconf->adapt_order) {
>>>> - int opt_order_length = av_ceil_log2(av_clip((block_length >> 3) - 1,
>>>> + int opt_order_length = av_ceil_log2(av_clip((bd->block_length >> 3) - 1,
>>>> 2, sconf->max_order + 1));
>>>> - opt_order = get_bits(gb, opt_order_length);
>>>> + bd->opt_order = get_bits(gb, opt_order_length);
>>>> } else {
>>>> - opt_order = sconf->max_order;
>>>> + bd->opt_order = sconf->max_order;
>>>> }
>>>>
>>>> - if (opt_order) {
>>>> + if (bd->opt_order) {
>>>> int add_base;
>>>>
>>>> if (sconf->coef_table == 3) {
>>>> add_base = 0x7F;
>>>>
>>>> // read coefficient 0
>>>> - quant_cof[0] = 32 * parcor_scaled_values[get_bits(gb, 7)];
>>>> + bd->quant_cof[0] = 32 * parcor_scaled_values[get_bits(gb, 7)];
>>>>
>>>> // read coefficient 1
>>>> - if (opt_order > 1)
>>>> - quant_cof[1] = -32 * parcor_scaled_values[get_bits(gb, 7)];
>>>> + if (bd->opt_order > 1)
>>>> + bd->quant_cof[1] = -32 * parcor_scaled_values[get_bits(gb, 7)];
>>>>
>>>> // read coefficients 2 to opt_order
>>>> - for (k = 2; k < opt_order; k++)
>>>> - quant_cof[k] = get_bits(gb, 7);
>>>> + for (k = 2; k < bd->opt_order; k++)
>>>> + bd->quant_cof[k] = get_bits(gb, 7);
>>>> } else {
>>>> int k_max;
>>>> add_base = 1;
>>>>
>>>> // read coefficient 0 to 19
>>>> - k_max = FFMIN(opt_order, 20);
>>>> + k_max = FFMIN(bd->opt_order, 20);
>>>> for (k = 0; k < k_max; k++) {
>>>> int rice_param = parcor_rice_table[sconf->coef_table][k][1];
>>>> int offset = parcor_rice_table[sconf->coef_table][k][0];
>>>> - quant_cof[k] = decode_rice(gb, rice_param) + offset;
>>>> + bd->quant_cof[k] = decode_rice(gb, rice_param) + offset;
>>>> }
>>>>
>>>> // read coefficients 20 to 126
>>>> - k_max = FFMIN(opt_order, 127);
>>>> + k_max = FFMIN(bd->opt_order, 127);
>>>> for (; k < k_max; k++)
>>>> - quant_cof[k] = decode_rice(gb, 2) + (k & 1);
>>>> + bd->quant_cof[k] = decode_rice(gb, 2) + (k & 1);
>>>>
>>>> // read coefficients 127 to opt_order
>>>> - for (; k < opt_order; k++)
>>>> - quant_cof[k] = decode_rice(gb, 1);
>>>> + for (; k < bd->opt_order; k++)
>>>> + bd->quant_cof[k] = decode_rice(gb, 1);
>>>>
>>>> - quant_cof[0] = 32 * parcor_scaled_values[quant_cof[0] + 64];
>>>> + bd->quant_cof[0] = 32 * parcor_scaled_values[bd->quant_cof[0] + 64];
>>>>
>>>> - if (opt_order > 1)
>>>> - quant_cof[1] = -32 * parcor_scaled_values[quant_cof[1] + 64];
>>>> + if (bd->opt_order > 1)
>>>> + bd->quant_cof[1] = -32 * parcor_scaled_values[bd->quant_cof[1] + 64];
>>>> }
>>>>
>>>> - for (k = 2; k < opt_order; k++)
>>>> - quant_cof[k] = (quant_cof[k] << 14) + (add_base << 13);
>>>> + for (k = 2; k < bd->opt_order; k++)
>>>> + bd->quant_cof[k] = (bd->quant_cof[k] << 14) + (add_base << 13);
>>> it might make sense to have commonly used variables like opt_order on the
>>> stack intead of just accessable through a pointer
>> Yes.
>>
>>> besides does this patch lead to any slowdown?
>> The benchmark results showed a little slowdown but at least hardly
>> noticable on the command line...
>> We had this issue at the soc list. Ended up with me providing some
>> assembler file grep results for checking inlining done by the compiler.
>
> was the slowdown fixed or not?
> "not" is bad
It was not. You blamed the many bd-> to be a reason for the slowdown and
requested a check for wrong inlining before considering other
possibilities to make this faster. I'm not able to see anything about
the inlining stuff in assembler files so I provided a grep result you
asked for.
See:
https://lists.mplayerhq.hu/pipermail/ffmpeg-soc/2009-October/008364.html
I'm ready to fix that slowdown issue, all I need is a little advise
about how to... the obvious way to use local variables to store
bd->something and removing many dereferences that way did not yield a
significant reduction of the slowdown.
-Thilo
More information about the ffmpeg-devel
mailing list