[FFmpeg-devel] [PATCH] Split reading and decoding of blocks in ALS
Michael Niedermayer
michaelni
Tue Nov 24 13:10:34 CET 2009
On Mon, Nov 23, 2009 at 06:24:00PM -0500, Justin Ruggles wrote:
> Thilo Borgmann wrote:
>
> > Michael Niedermayer schrieb:
> >> On Mon, Nov 23, 2009 at 11:28:35AM +0100, Thilo Borgmann wrote:
> >>> Michael Niedermayer schrieb:
> >>>> On Sat, Nov 14, 2009 at 02:15:30PM +0100, Thilo Borgmann wrote:
> >>>>> Hi,
> >>>>>
> >>>>> in order to support multi-channel correlation, reading and decoding of a
> >>>>> block has to be separated. This patch introduces ALSBockData struct for
> >>>>> that purpose.
> >>>> [...]
> >>>>> +/** Decodes the block data for a constant block
> >>>>> + */
> >>>>> +static void decode_const_block_data(ALSDecContext *ctx, ALSBlockData *bd)
> >>>>> +{
> >>>>> + int smp;
> >>>>> +
> >>>>> // write raw samples into buffer
> >>>>> - for (k = 0; k < block_length; k++)
> >>>>> - raw_samples[k] = const_val;
> >>>>> + for (smp = 0; smp < bd->block_length; smp++)
> >>>>> + bd->raw_samples[smp] = bd->const_val;
> >>>>> }
> >>>> memset32(dst, val, len){
> >>>> }
> >>>> would possibly be faster due to not doing bd-> in the loop
> >>> Ok will do...
> >>>
> >>>
> >>>> [...]
> >>>>> @@ -553,115 +579,132 @@
> >>>>> }
> >>>>>
> >>>>> if (get_bits1(gb))
> >>>>> - *shift_lsbs = get_bits(gb, 4) + 1;
> >>>>> + bd->shift_lsbs = get_bits(gb, 4) + 1;
> >>>>>
> >>>>> - store_prev_samples = (*js_blocks && raw_other) || *shift_lsbs;
> >>>>> + bd->store_prev_samples = (bd->js_blocks && bd->raw_other) || bd->shift_lsbs;
> >>>>>
> >>>>> -
> >>>>> if (!sconf->rlslms) {
> >>>>> if (sconf->adapt_order) {
> >>>>> - int opt_order_length = av_ceil_log2(av_clip((block_length >> 3) - 1,
> >>>>> + int opt_order_length = av_ceil_log2(av_clip((bd->block_length >> 3) - 1,
> >>>>> 2, sconf->max_order + 1));
> >>>>> - opt_order = get_bits(gb, opt_order_length);
> >>>>> + bd->opt_order = get_bits(gb, opt_order_length);
> >>>>> } else {
> >>>>> - opt_order = sconf->max_order;
> >>>>> + bd->opt_order = sconf->max_order;
> >>>>> }
> >>>>>
> >>>>> - if (opt_order) {
> >>>>> + if (bd->opt_order) {
> >>>>> int add_base;
> >>>>>
> >>>>> if (sconf->coef_table == 3) {
> >>>>> add_base = 0x7F;
> >>>>>
> >>>>> // read coefficient 0
> >>>>> - quant_cof[0] = 32 * parcor_scaled_values[get_bits(gb, 7)];
> >>>>> + bd->quant_cof[0] = 32 * parcor_scaled_values[get_bits(gb, 7)];
> >>>>>
> >>>>> // read coefficient 1
> >>>>> - if (opt_order > 1)
> >>>>> - quant_cof[1] = -32 * parcor_scaled_values[get_bits(gb, 7)];
> >>>>> + if (bd->opt_order > 1)
> >>>>> + bd->quant_cof[1] = -32 * parcor_scaled_values[get_bits(gb, 7)];
> >>>>>
> >>>>> // read coefficients 2 to opt_order
> >>>>> - for (k = 2; k < opt_order; k++)
> >>>>> - quant_cof[k] = get_bits(gb, 7);
> >>>>> + for (k = 2; k < bd->opt_order; k++)
> >>>>> + bd->quant_cof[k] = get_bits(gb, 7);
> >>>>> } else {
> >>>>> int k_max;
> >>>>> add_base = 1;
> >>>>>
> >>>>> // read coefficient 0 to 19
> >>>>> - k_max = FFMIN(opt_order, 20);
> >>>>> + k_max = FFMIN(bd->opt_order, 20);
> >>>>> for (k = 0; k < k_max; k++) {
> >>>>> int rice_param = parcor_rice_table[sconf->coef_table][k][1];
> >>>>> int offset = parcor_rice_table[sconf->coef_table][k][0];
> >>>>> - quant_cof[k] = decode_rice(gb, rice_param) + offset;
> >>>>> + bd->quant_cof[k] = decode_rice(gb, rice_param) + offset;
> >>>>> }
> >>>>>
> >>>>> // read coefficients 20 to 126
> >>>>> - k_max = FFMIN(opt_order, 127);
> >>>>> + k_max = FFMIN(bd->opt_order, 127);
> >>>>> for (; k < k_max; k++)
> >>>>> - quant_cof[k] = decode_rice(gb, 2) + (k & 1);
> >>>>> + bd->quant_cof[k] = decode_rice(gb, 2) + (k & 1);
> >>>>>
> >>>>> // read coefficients 127 to opt_order
> >>>>> - for (; k < opt_order; k++)
> >>>>> - quant_cof[k] = decode_rice(gb, 1);
> >>>>> + for (; k < bd->opt_order; k++)
> >>>>> + bd->quant_cof[k] = decode_rice(gb, 1);
> >>>>>
> >>>>> - quant_cof[0] = 32 * parcor_scaled_values[quant_cof[0] + 64];
> >>>>> + bd->quant_cof[0] = 32 * parcor_scaled_values[bd->quant_cof[0] + 64];
> >>>>>
> >>>>> - if (opt_order > 1)
> >>>>> - quant_cof[1] = -32 * parcor_scaled_values[quant_cof[1] + 64];
> >>>>> + if (bd->opt_order > 1)
> >>>>> + bd->quant_cof[1] = -32 * parcor_scaled_values[bd->quant_cof[1] + 64];
> >>>>> }
> >>>>>
> >>>>> - for (k = 2; k < opt_order; k++)
> >>>>> - quant_cof[k] = (quant_cof[k] << 14) + (add_base << 13);
> >>>>> + for (k = 2; k < bd->opt_order; k++)
> >>>>> + bd->quant_cof[k] = (bd->quant_cof[k] << 14) + (add_base << 13);
> >>>> it might make sense to have commonly used variables like opt_order on the
> >>>> stack intead of just accessable through a pointer
> >>> Yes.
> >>>
> >>>> besides does this patch lead to any slowdown?
> >>> The benchmark results showed a little slowdown but at least hardly
> >>> noticable on the command line...
> >>> We had this issue at the soc list. Ended up with me providing some
> >>> assembler file grep results for checking inlining done by the compiler.
> >> was the slowdown fixed or not?
> >> "not" is bad
> >
> > It was not. You blamed the many bd-> to be a reason for the slowdown and
> > requested a check for wrong inlining before considering other
> > possibilities to make this faster. I'm not able to see anything about
> > the inlining stuff in assembler files so I provided a grep result you
> > asked for.
> >
> > See:
> > https://lists.mplayerhq.hu/pipermail/ffmpeg-soc/2009-October/008364.html
> >
> > I'm ready to fix that slowdown issue, all I need is a little advise
> > about how to... the obvious way to use local variables to store
> > bd->something and removing many dereferences that way did not yield a
> > significant reduction of the slowdown.
>
> grepping for "call" and removing duplicates gives:
>
> combined:
> call _parse_bs_info
> call _decode_end
> call _read_block_data
> call _zero_remaining
>
> separate:
> call _parse_bs_info
> call _decode_end
> call _decode_var_block_data
> call _zero_remaining
> call _read_var_block_data
> call _decode_blocks_ind
>
> It seems that decode_blocks_ind is inlined in the combined version but
> not in the separate version.
so, does adding always_inline help?
(note, expect that adding this to one function will make gcc stop
inlining other unrelated functions so you might need many always_inline
to force gcc not to be silly)
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091124/e0cc98cb/attachment.pgp>
More information about the ffmpeg-devel
mailing list