[FFmpeg-devel] [PATCH] HE-AAC v1 try 5

Mon Mar 8 05:26:16 CET 2010

On Sat, Mar 6, 2010 at 4:12 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Fri, Mar 05, 2010 at 07:00:14PM -0500, Alex Converse wrote:
>> On Thu, Mar 4, 2010 at 5:20 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Wed, Mar 03, 2010 at 10:07:12PM -0500, Alex Converse wrote:
[...]
>>
>> Thanks for the feedback, most of it was very useful. However on a few
>> points I vehemently disagree.
>>
>> Attached is an updated patch.
>>
>> At this point we are significantly faster than faad. With your
>> approval I would like to commit this and move on to close the feature
>> gap with faad (work on ps).
>
> The code is of less than ideal quality currently
> Still as this is a high priority feature i dont object to it being commited
> given that it is improved further in the future when people find more
> issues in it.
> One thing though must be resolved first, and that are the variable names
> they are all 1:1 copied from the spec and i do not know if we can legally
> include this in a *GPL program in terms of copyright.
> You seem to not even know with certainity what some variable names mean ...
> Not to mention these are extreemly poor names on technical grounds
> That is unless one of our legal experts says, this is ok
>
> also if we did find some one selling code outside the *GPL and
> that used 100% identical variable names to ffmpeg we would have a quite
> strong case against him if some ffmpeg devel choose to pursue it.
>

I'm going to agree with Mans here.

>> >> +static inline void remove_table_element_int16(int16_t *table, uint8_t *last_el,
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int el)
>> >> +{
>> >> + ? ?memmove(table + el, table + el + 1, (*last_el - el)*sizeof(int16_t));
>> >> + ? ?(*last_el)--;
>> >> +}
>> >
>> > please use variable names that make sense, that is size or length not last_el
>> > and el should be index

last_el(ement) has a sight semantic difference from length.
last_el = length-1

But remove_table_element_int16 is no longer relevant

>> >
>>
>> el(ement) and last_el(ement) make plenty of sense to me.
>
> it can very easily be understood as:
>
> for(i=0; table[i] != el; i++)
> ? ?;
> memmove(table + i, table + i + 1, ....
>
> i think index and length are simply clearer
>
>
> [...]
>
>>
>> >
>> >
>> >> + ? ? ? ? ? ?if (sbr->f_tablelim[k] == sbr->f_tablelim[k-1] ||
>> >> + ? ? ? ? ? ? ? ?!in_table_int16(patch_borders, sbr->num_patches, sbr->f_tablelim[k]))
>> >> + ? ? ? ? ? ? ? ?remove_table_element_int16(sbr->f_tablelim, &sbr->n_lim, k);
>> >> + ? ? ? ? ? ?else if (!in_table_int16(patch_borders, sbr->num_patches, sbr->f_tablelim[k-1]))
>> >> + ? ? ? ? ? ? ? ?remove_table_element_int16(sbr->f_tablelim, &sbr->n_lim, k-1);
>> >> + ? ? ? ? ? ?else
>> >> + ? ? ? ? ? ? ? ?k++;
>> >> + ? ? ? ?}
>> >
>> > this code seems somewhat obfuscated
>> > we have 2 tables A and B, they are combined into table C and sorted
>> > then starting from 0 to size all elements from C are removed that are
>> > too close to their previous element unless they where in B
>> >
>> > this can trivially be done in O(|C|) time your code though needs
>> > O(|C|*|C|) besides that the code seems messy which probably is the
>> > bigger issue.
>> > note, iam not asking you to make it faster if speed is irrelevant but
>> > it could be made less messy i think
>> >
>>
>> I've simplified the logic slightly by removing the continue.
>> It looks O(|C||B|) and fairly clean to me. If you have an idea that's
>> less messy, please propose it.
>
> i would use 2 arrays, that way remove_table_element_int16 would become
> in++
> or
> out--

*(out-1) = *in++;

> and the keep case would become
> *out++ = *in++
>

Fixed

>
>
>>
> [...]
>> >
>> > also i seriously doubt that the whole ac^k needs to be scaned for the
>> > minimum element, at the least you can stop once elements are more than 1 or
>> > 2 larger than the minimum, and that assumes rounding has a significant
>> > effect
>> >
>>
>> If you wish to pursue this please provide a benchmark justifying the
>> extra complexity
>
> Its surely faster, i dont know what effect it has overall, i guess this
> can be tested after commit ...
>
>
>>
>> >
>> > [...]
>> >> +
>> >> +/// Master Frequency Band Table (14496-3 sp04 p194)
>> >> +static int sbr_make_f_master(AACContext *ac, SpectralBandReplication *sbr,
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? SpectrumParameters *spectrum)
>> >> +{
>> >> + ? ?unsigned int temp;
>> >> + ? ?unsigned int start_min, stop_min;
>> >> + ? ?int k;
>> >> + ? ?const uint8_t *sbr_offset_ptr;
>> >> + ? ?int16_t stop_dk[13];
>> >> +
>> >> + ? ?if (sbr->sample_rate < 32000) {
>> >> + ? ? ? ?temp = 3000;
>> >> + ? ?} else if (sbr->sample_rate < 64000) {
>> >> + ? ? ? ?temp = 4000;
>> >> + ? ?} else
>> >> + ? ? ? ?temp = 5000;
>> >> +
>> >> + ? ?start_min = ((temp << 7) + (sbr->sample_rate >> 1)) / sbr->sample_rate;
>> >> + ? ?stop_min ?= ((temp << 8) + (sbr->sample_rate >> 1)) / sbr->sample_rate;
>> >> +
>> >> + ? ?switch (sbr->sample_rate) {
>> >> + ? ?case 16000:
>> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[0];
>> >> + ? ? ? ?break;
>> >> + ? ?case 22050:
>> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[1];
>> >> + ? ? ? ?break;
>> >> + ? ?case 24000:
>> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[2];
>> >> + ? ? ? ?break;
>> >> + ? ?case 32000:
>> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[3];
>> >> + ? ? ? ?break;
>> >> + ? ?case 44100: case 48000: case 64000:
>> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[4];
>> >> + ? ? ? ?break;
>> >> + ? ?case 88200: case 96000: case 128000: case 176400: case 192000:
>> >> + ? ? ? ?sbr_offset_ptr = sbr_offset[5];
>> >> + ? ? ? ?break;
>> >> + ? ?default:
>> >> + ? ? ? ?av_log(ac->avccontext, AV_LOG_ERROR,
>> >> + ? ? ? ? ? ? ? "Unsupported sample rate for SBR: %d\n", sbr->sample_rate);
>> >> + ? ? ? ?return -1;
>> >> + ? ?}
>> >> +
>> >> + ? ?sbr->k[0] = start_min + sbr_offset_ptr[spectrum->bs_start_freq];
>> >> +
>> >> + ? ?if (spectrum->bs_stop_freq < 14) {
>> >> + ? ? ? ?sbr->k[2] = stop_min;
>> >
>> >> + ? ? ? ?make_bands(stop_dk, stop_min, 64, 13);
>> >> + ? ? ? ?qsort(stop_dk, 13, sizeof(stop_dk[0]), qsort_comparison_function_int16);
>> >
>> > just confirming, this array is not guranteed to be sorted already?
>> >
>>
>> If it comes out of make_bands it isn't sorted.
>>
>> If you inspect the history you will see adding these sorts fixed specific bugs.
>
> understood, though if this is speed relevant, a insertion sort or some
> sort that is fast for nearly sorted data would be better than qsort()
>

speed does not appear relevant in this function at this time.

>
> [...]
>> >
>> > [...]
>> >> + ? ?// temp == max number of QMF subbands
>> >> + ? ?if (sbr->sample_rate <= 32000) {
>> >> + ? ? ? ?temp = 48;
>> >> + ? ?} else if (sbr->sample_rate == 44100) {
>> >> + ? ? ? ?temp = 35;
>> >> + ? ?} else if (sbr->sample_rate >= 48000)
>> >> + ? ? ? ?temp = 32;
>> >> +
>> >> + ? ?if (sbr->k[2] - sbr->k[0] > temp) {
>> >> + ? ? ? ?av_log(ac->avccontext, AV_LOG_ERROR,
>> >> + ? ? ? ? ? ? ? "Invalid bitstream, too many QMF subbands: %d\n", sbr->k[2] - sbr->k[0]);
>> >> + ? ? ? ?return -1;
>> >> + ? ?}
>> >
>> >
>> >
>> >> +
>> >> + ? ?return 0;
>> >> +}
>> >> +
>> >> +/// High Frequency Generation - Patch Construction (14496-3 sp04 p216 fig. 4.46)
>> >> +static int sbr_hf_calc_npatches(AACContext *ac, SpectralBandReplication *sbr)
>> >> +{
>> >> + ? ?int i, k, sb = 0;
>> >
>> >> + ? ?int msb = sbr->k[0];
>> >> + ? ?int usb = sbr->k[4];
>> >
>> > what is usb
>> Possibly upper subband?
>> > what is msb
>> Possibly middle subband?
>
> "possibly"? you dont know what the code you submit means?
>

I've made no attempts to conceal the fact that I am not the author of
a substantial portion of this code.

>
> [...]
>> > and the condition can probably be written wthout odd
>> > maybe:
>> > sb > ((msb - 1)&~1) + sbr->k[0]
>> > works?
>> >
>>
>> Is there an advantage to eliminating odd?
>
> it appears cleaner to me, i see no other advanatge
>
>
> [...]
>
>> >> +static void sbr_grid_copy(SBRData *dst, const SBRData *src) {
>> >> + ? ?//These variables are saved from the previous frame rather than copied
>> >> + ? ?dst->bs_freq_res[0] = dst->bs_freq_res[dst->bs_num_env[1]];
>> >> + ? ?dst->bs_num_env[0] ?= dst->bs_num_env[1];
>> >> +
>> >
>> >> + ? ?//These variables are read from the bitstream and therefore copied
>> >> + ? ?memcpy(dst->bs_freq_res+1, src->bs_freq_res+1, sizeof(dst->bs_freq_res)-1);
>> >> + ? ?memcpy(dst->bs_num_env+1, ?src->bs_num_env+1, ?sizeof(dst->bs_num_env)-1);
>> >
>> > the len is only correct when its a (u)int8_t
>> >
>>
>> And it is. What a "lucky" coincidence.
>
> Designed with the hope it wont be changed by anyone ever
> -sizeof(*dst->bs_freq_res)
> would make it more robust against such change
>
>
[...]

>> > [...]
>> >> +/// SBR Envelope and Noise Floor Decoding (14496-3 sp04 p201)
>> >> +static void sbr_env_noise_floors(SpectralBandReplication *sbr, SBRData *ch_data,
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int ch)
>> >> +{
>> >> + ? ?int delta = (ch == 1 && sbr->bs_coupling == 1) + 1;
>> >> + ? ?int i, k, L;
>> >> + ? ?const int temp = sbr->n[1] & 1;
>> >> + ? ?for (L = 0; L < ch_data->bs_num_env[1]; L++) {
>> >> + ? ? ? ?if (ch_data->bs_df_env[L]) {
>> >> + ? ? ? ? ? ?// bs_freq_res[0] == bs_freq_res[bs_num_env[1]] from prev frame
>> >> + ? ? ? ? ? ?if (ch_data->bs_freq_res[L + 1] == ch_data->bs_freq_res[L]) {
>> >> + ? ? ? ? ? ? ? ?for (k = 0; k < sbr->n[ch_data->bs_freq_res[L + 1]]; k++)
>> >> + ? ? ? ? ? ? ? ? ? ?ch_data->env_facs[L + 1][k] = ch_data->env_facs[L][k] + delta * ch_data->bs_data_env[L][k];
>> >> + ? ? ? ? ? ?} else if (ch_data->bs_freq_res[L + 1]) {
>> >> + ? ? ? ? ? ? ? ?for (k = 0; k < sbr->n[ch_data->bs_freq_res[L + 1]]; k++) {
>> >> + ? ? ? ? ? ? ? ? ? ?i = (k + temp) >> 1; // find i such that f_tablelow[i] <= f_tablehigh[k] < f_tablelow[i + 1]
>> >> + ? ? ? ? ? ? ? ? ? ?ch_data->env_facs[L + 1][k] = ch_data->env_facs[L][i] + delta * ch_data->bs_data_env[L][k];
>> >> + ? ? ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ?} else {
>> >> + ? ? ? ? ? ? ? ?for (k = 0; k < sbr->n[ch_data->bs_freq_res[L + 1]]; k++) {
>> >> + ? ? ? ? ? ? ? ? ? ?i = k ? 2*k - temp : 0; // find i such that f_tablehigh[i] == f_tablelow[k]
>> >> + ? ? ? ? ? ? ? ? ? ?ch_data->env_facs[L + 1][k] = ch_data->env_facs[L][i] + delta * ch_data->bs_data_env[L][k];
>> >> + ? ? ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ?}
>> >> + ? ? ? ?} else {
>> >> + ? ? ? ? ? ?ch_data->env_facs[L + 1][0] = delta * ch_data->bs_data_env[L][0];
>> >> + ? ? ? ? ? ?for (k = 1; k < sbr->n[ch_data->bs_freq_res[L + 1]]; k++)
>> >> + ? ? ? ? ? ? ? ?ch_data->env_facs[L + 1][k] = ch_data->env_facs[L + 1][k - 1] + delta * ch_data->bs_data_env[L][k];
>> >> + ? ? ? ?}
>> >> + ? ?}
>> >
>> > please dont use l as variable name it looks like 1 in many fonts and
>> > makes the code harder to read.
>> > It is not reasonable to expect everyone who wishes to read the code to
>> > switch fonts or run a search and replace over the code, like i just did
>> > so i can review the code.
>> >
>>
>> Find a font that doesn't suck. l is the canonical variable for
>> iterating over SBR envelopes.
>
> I surely can, but
> as said this is unreasonable to expect from every reader of ffmpeg code.
> and making the code less readable because of your stubbornness seems
> quite unreasonable to me.
> Please change l to some other identifer
>

It isn't worth my time arguing this. Changed. Enjoy your 1980s fonts

>
> [...]
>> >
>> >> + ? ? ? ?for (L = 1; L <= sbr->data[0].bs_num_env[1]; L++) {
>> >> + ? ? ? ? ? ?for (k = 0; k < sbr->n[sbr->data[0].bs_freq_res[L]]; k++) {
>> >> + ? ? ? ? ? ? ? ?float temp1 = exp2f(sbr->data[0].env_facs[L][k] * alpha + 7.0f);
>> >> + ? ? ? ? ? ? ? ?float temp2 = exp2f((pan_offset - sbr->data[1].env_facs[L][k]) * alpha);
>> >
>> > isnt env_facs integer here? i so this can probably be done more
>> > efficiently with a LUT
>> >
>>
>> env_facs can get quite large. The LAV of the delta coded version is
>> 60. I'm not sure of the exact bounds.
>
> alpha can be 0.5 or 1
> float uses 8bit exponents
> if we assume the float does not overflow, that makes <512 entries
> but i very seriously doubt that the dynamic range of any variable is that
> large
> the question is primarely if this is speedrelevant, if not exp2f() is fine
> otherwise a table is (if faster) the better choice
>

These calls are not speed critical at the moment.

>
>
> [...]
>> >
>> >> + ? ?return -1;
>> >> +}
>> >> +
>> >
>> >> +/// Generate the subband filtered lowband
>> >> +static int sbr_lf_gen(AACContext *ac, SpectralBandReplication *sbr,
>> >> + ? ? ? ? ? ? ? ? ? ? ?float X_low[32][40][2], float W[2][32][32][2]) {
>> >
>> > nothing being const looks suspicous
>> >
>>
>> not being const is the easiest way to keep gcc from getting pissy
>> while looking for real warnings.
>
> gcc does not complain if all your variables are marked properly
> it just complains if a subset is marked const
>

gcc believes that multilevel const casts are fundamentally unsafe

/home/alex/Projects/ffmpeg/aac-sbr/ffmpeg/libavcodec/aacsbr.c:1357:
note: expected ?const float (*)[2]? but argument is of type ?float
(*)[2]?

>
>> "Fixed"
>>
>> >
>> >> + ? ?int k, L;
>> >> + ? ?const int t_HFGen = 8;
>> >> + ? ?const int l_f = 32;
>> >> + ? ?memset(X_low, 0, 32*sizeof(*X_low));
>> >> + ? ?for (k = 0; k < sbr->k[4]; k++) {
>> >> + ? ? ? ?for (L = t_HFGen; L < l_f + t_HFGen; L++) {
>> >> + ? ? ? ? ? ?X_low[k][L][0] = W[1][L - t_HFGen][k][0];
>> >> + ? ? ? ? ? ?X_low[k][L][1] = W[1][L - t_HFGen][k][1];
>> >> + ? ? ? ?}
>> >> + ? ?}
>> >> + ? ?for (k = 0; k < sbr->k[3]; k++) {
>> >> + ? ? ? ?for (L = 0; L < t_HFGen; L++) {
>> >> + ? ? ? ? ? ?X_low[k][L][0] = W[0][L + l_f - t_HFGen][k][0];
>> >> + ? ? ? ? ? ?X_low[k][L][1] = W[0][L + l_f - t_HFGen][k][1];
>> >> + ? ? ? ?}
>> >> + ? ?}
>> >
>> > the memset above seems to set some elements to 0 that are never read
>>
>> It avoids per k calls to memset as well as extra logic at the bottom.
>> maybe rectangle filling functions can be used here?
>
> whatever you prefer but a 10kb memset is not good if it can be reduced
>
>

agreed

>>
>> >
>> > also cant this copying around be avoided and "W" be accessed directly
>> > by the code that uses X_low?
>> >
>>
>> It breaks down to a trade off of what's more offensive, the extra
>> copying or all the extra logic needed to map X_low to W[0] W[1] or 0
>> appropriately.
>
> id say its more a question of whats faster&smaller than whats more
> offensive
>

By offensive I did mean for smaller and faster to be part of the equation.
Complicated addressing schemes can slow things down and bloat object
code substantially

[...]

>> >> + ? ? ? ? ? ? ? ?e_curr[L][m] = sum * recip_env_size;
>> >> + ? ? ? ? ? ?}
>> >> + ? ? ? ?}
>> >> + ? ?} else {
>> >> + ? ? ? ?int k, p;
>> >> +
>> >> + ? ? ? ?for (L = 0; L < ch_data->bs_num_env[1]; L++) {
>> >> + ? ? ? ? ? ?const int env_size = 2 * (ch_data->t_env[L + 1] - ch_data->t_env[L]);
>> >> + ? ? ? ? ? ?int ilb = ch_data->t_env[L] ? ? * 2 + ENVELOPE_ADJUSTMENT_OFFSET;
>> >> + ? ? ? ? ? ?int iub = ch_data->t_env[L + 1] * 2 + ENVELOPE_ADJUSTMENT_OFFSET;
>> >> + ? ? ? ? ? ?const uint16_t *table = ch_data->bs_freq_res[L + 1] ? sbr->f_tablehigh : sbr->f_tablelow;
>> >> +
>> >> + ? ? ? ? ? ?for (p = 0; p < sbr->n[ch_data->bs_freq_res[L + 1]]; p++) {
>> >> + ? ? ? ? ? ? ? ?float sum = 0.0f;
>> >> + ? ? ? ? ? ? ? ?const int den = env_size * (table[p + 1] - table[p]);
>> >> +
>> >> + ? ? ? ? ? ? ? ?for (k = table[p]; k < table[p + 1]; k++) {
>> >
>> >> + ? ? ? ? ? ? ? ? ? ?for (i = ilb; i < iub; i++) {
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?sum += X_high[k][i][0] * X_high[k][i][0] +
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? X_high[k][i][1] * X_high[k][i][1];
>> >> + ? ? ? ? ? ? ? ? ? ?}
>> >
>> > duplicate
>>
>> It's a simple dot product. There are also some duplicate i++s
>> scattered across the code as well.
>
> dot_product() or a call to dsp would be nicer
>

I will create an unaligned dsp dot product... later

>> > [...]
>> >> + ? ?static const int8_t phi[2][4] = {
>> >> + ? ? ? ?{ ?1, ?0, -1, ?0}, // real
>> >> + ? ? ? ?{ ?0, ?1, ?0, -1}, // imaginary
>> >> + ? ?};
>> >> + ? ?float (*g_temp)[48] = ch_data->g_temp, (*q_temp)[48] = ch_data->q_temp;
>> >> + ? ?int indexnoise = ch_data->f_indexnoise;
>> >> + ? ?int indexsine ?= ch_data->f_indexsine;
>> >> + ? ?memcpy(Y[0], Y[1], sizeof(Y[0]));
>> >> +
>> >> + ? ?if (sbr->reset) {
>> >> + ? ? ? ?for (i = 0; i < h_SL; i++) {
>> >> + ? ? ? ? ? ?memcpy(g_temp[i + 2*ch_data->t_env[0]], sbr->gain[0], m_max * sizeof(sbr->gain[0][0]));
>> >> + ? ? ? ? ? ?memcpy(q_temp[i + 2*ch_data->t_env[0]], sbr->q_m[0], ?m_max * sizeof(sbr->q_m[0][0]));
>> >> + ? ? ? ?}
>> >> + ? ?} else if (h_SL) {
>> >> + ? ? ? ?memcpy(g_temp[2*ch_data->t_env[0]], g_temp[2*ch_data->t_env_num_env_old], 4*sizeof(g_temp[0]));
>> >> + ? ? ? ?memcpy(q_temp[2*ch_data->t_env[0]], q_temp[2*ch_data->t_env_num_env_old], 4*sizeof(q_temp[0]));
>> >> + ? ?}
>> >> +
>> >> + ? ?for (L = 0; L < ch_data->bs_num_env[1]; L++) {
>> >> + ? ? ? ?for (i = 2 * ch_data->t_env[L]; i < 2 * ch_data->t_env[L + 1]; i++) {
>> >> + ? ? ? ? ? ?memcpy(g_temp[h_SL + i], sbr->gain[L], m_max * sizeof(sbr->gain[0][0]));
>> >> + ? ? ? ? ? ?memcpy(q_temp[h_SL + i], sbr->q_m[L], ?m_max * sizeof(sbr->q_m[0][0]));
>> >> + ? ? ? ?}
>> >> + ? ?}
>> >> +
>> >> + ? ?for (L = 0; L < ch_data->bs_num_env[1]; L++) {
>> >> + ? ? ? ?for (i = 2 * ch_data->t_env[L]; i < 2 * ch_data->t_env[L + 1]; i++) {
>> >> + ? ? ? ? ? ?int phi_sign = (1 - 2*(kx & 1));
>> >> +
>> >> + ? ? ? ? ? ?if (h_SL && L != l_a[0] && L != l_a[1]) {
>> >> + ? ? ? ? ? ? ? ?for (m = 0; m < m_max; m++) {
>> >> + ? ? ? ? ? ? ? ? ? ?const int idx1 = i + h_SL;
>> >> + ? ? ? ? ? ? ? ? ? ?float g_filt = 0.0f;
>> >> + ? ? ? ? ? ? ? ? ? ?for (j = 0; j <= h_SL; j++)
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?g_filt += g_temp[idx1 - j][m] * h_smooth[j];
>> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] =
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?X_high[m + kx][i + ENVELOPE_ADJUSTMENT_OFFSET][0] * g_filt;
>> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] =
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?X_high[m + kx][i + ENVELOPE_ADJUSTMENT_OFFSET][1] * g_filt;
>> >> + ? ? ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ?} else {
>> >> + ? ? ? ? ? ? ? ?for (m = 0; m < m_max; m++) {
>> >> + ? ? ? ? ? ? ? ? ? ?const float g_filt = g_temp[i + h_SL][m];
>> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] =
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?X_high[m + kx][i + ENVELOPE_ADJUSTMENT_OFFSET][0] * g_filt;
>> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] =
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?X_high[m + kx][i + ENVELOPE_ADJUSTMENT_OFFSET][1] * g_filt;
>> >> + ? ? ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ?}
>> >> +
>> >
>> >> + ? ? ? ? ? ?if (L != l_a[0] && L != l_a[1]) {
>> >> + ? ? ? ? ? ? ? ?for (m = 0; m < m_max; m++) {
>> >> + ? ? ? ? ? ? ? ? ? ?indexnoise = (indexnoise + 1) & 0x1ff;
>> >> + ? ? ? ? ? ? ? ? ? ?if (sbr->s_m[L][m]) {
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] +=
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?sbr->s_m[L][m] * phi[0][indexsine];
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] +=
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?sbr->s_m[L][m] * (phi[1][indexsine] * phi_sign);
>> >> + ? ? ? ? ? ? ? ? ? ?} else {
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?float q_filt;
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?if (h_SL) {
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?const int idx1 = i + h_SL;
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt = 0.0f;
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?for (j = 0; j <= h_SL; j++)
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt += q_temp[idx1 - j][m] * h_smooth[j];
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?} else {
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt = q_temp[i][m];
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] +=
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt * sbr_noise_table[indexnoise][0];
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] +=
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?q_filt * sbr_noise_table[indexnoise][1];
>> >> + ? ? ? ? ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ? ? ? ? ?phi_sign = -phi_sign;
>> >> + ? ? ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ?} else {
>> >> + ? ? ? ? ? ? ? ?indexnoise = (indexnoise + m_max) & 0x1ff;
>> >> + ? ? ? ? ? ? ? ?for (m = 0; m < m_max; m++) {
>> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][0] +=
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?sbr->s_m[L][m] * phi[0][indexsine];
>> >> + ? ? ? ? ? ? ? ? ? ?Y[1][i][m + kx][1] +=
>> >> + ? ? ? ? ? ? ? ? ? ? ? ?sbr->s_m[L][m] * (phi[1][indexsine] * phi_sign);
>> >> + ? ? ? ? ? ? ? ? ? ?phi_sign = -phi_sign;
>> >> + ? ? ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ?}
>> >> + ? ? ? ? ? ?indexsine = (indexsine + 1) & 3;
>> >
>> > all the multiplies by phi* are 0,1,-1
>> > half of them are by 0
>> >
>> >
>>
>> Additional loop unrolling can be done at a later date. This is
>> sufficient for the time being.
>
> I thought more of writting it along the lines of:
> -Y[1][i][m + kx][0] += sbr->s_m[L][m] * phi[0][indexsine];
> -Y[1][i][m + kx][1] += sbr->s_m[L][m] * phi[1][indexsine] * phi_sign;
> +Y[1][i][m + kx][idx] += sbr->s_m[L][m] * phi_sign;
>

I don't follow

>> >> +
>> >> +/**
>> >> + * Spectral Band Replication header - spectrum parameters that invoke a reset if they differ from the previous header.
>> >> + */
>> >> +typedef struct {
>> >> + ? ?uint8_t bs_start_freq;
>> >> + ? ?uint8_t bs_stop_freq;
>> >> + ? ?uint8_t bs_xover_band;
>> >> +
>> >> + ? ?/**
>> >> + ? ? * @defgroup bs_header_extra_1 ? ? Variables associated with bs_header_extra_1
>> >> + ? ? * @{
>> >> + ? ? */
>> >> + ? ?uint8_t bs_freq_scale;
>> >> + ? ?uint8_t bs_alter_scale;
>> >> + ? ?uint8_t bs_noise_bands;
>> >> + ? ?/** @} */
>> >> +} SpectrumParameters;
>> >> +
>> >> +#define SBR_SYNTHESIS_BUF_SIZE ((1280-128)*2)
>> >> +
>> >> +/**
>> >> + * Spectral Band Replication per channel data
>> >> + */
>> >> +typedef struct {
>> >> + ? ?/**
>> >> + ? ? * @defgroup bitstream ? ? Main bitstream data variables
>> >> + ? ? * @{
>> >> + ? ? */
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_frame_class;
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_add_harmonic_flag;
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_num_env[2];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_freq_res[7];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_var_bord[2];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_num_rel[2];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_rel_bord[2][3];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_pointer;
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_num_noise;
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_df_env[5];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_df_noise[2];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_invf_mode[2][5];
>> >> + ? ?int32_t ? ? ? ? ? ?bs_data_env[7][32];
>> >> + ? ?int32_t ? ? ? ? ? ?bs_data_noise[2][5];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_add_harmonic[48];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_amp_res;
>> >> + ? ?/** @} */
>> >> +
>> >> + ? ?/**
>> >> + ? ? * @defgroup state ? ? ? ? State variables
>> >> + ? ? * @{
>> >> + ? ? */
>> >> + ? ?DECLARE_ALIGNED(16, float, synthesis_filterbank_samples)[SBR_SYNTHESIS_BUF_SIZE];
>> >> + ? ?DECLARE_ALIGNED(16, float, analysis_filterbank_samples) [1312];
>> >> + ? ?int ? ? ? ? ? ? ? ?synthesis_filterbank_samples_offset;
>> >> + ? ?int ? ? ? ? ? ? ? ?l_a[2];
>> >> + ? ?float ? ? ? ? ? ? ?bw_array[2][5];
>> >> + ? ?float ? ? ? ? ? ? ?W[2][32][32][2];
>> >> + ? ?float ? ? ? ? ? ? ?Y[2][38][64][2];
>> >> + ? ?float ? ? ? ? ? ? ?g_temp[42][48];
>> >> + ? ?float ? ? ? ? ? ? ?q_temp[42][48];
>> >> + ? ?uint8_t ? ? ? ? ? ?s_indexmapped[8][48];
>> >> + ? ?float ? ? ? ? ? ? ?env_facs[6][48];
>> >> + ? ?float ? ? ? ? ? ? ?noise_facs[3][5];
>> >> + ? ?uint8_t ? ? ? ? ? ?t_env[8];
>> >> + ? ?uint8_t ? ? ? ? ? ?t_env_num_env_old;
>> >> + ? ?uint8_t ? ? ? ? ? ?t_q[3];
>> >> + ? ?uint16_t ? ? ? ? ? f_indexnoise;
>> >> + ? ?uint8_t ? ? ? ? ? ?f_indexsine;
>> >> + ? ?/** @} */
>> >> +} SBRData;
>> >> +
>> >> +/**
>> >> + * Spectral Band Replication
>> >> + */
>> >> +typedef struct {
>> >> + ? ?int32_t ? ? ? ? ? ?sample_rate;
>> >> + ? ?uint8_t ? ? ? ? ? ?start;
>> >> + ? ?uint8_t ? ? ? ? ? ?reset;
>> >> + ? ?SpectrumParameters spectrum_params[2];
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_amp_res_header;
>> >> + ? ?/**
>> >> + ? ? * @defgroup bs_header_extra_2 ? ? variables associated with bs_header_extra_2
>> >> + ? ? * @{
>> >> + ? ? */
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_limiter_bands;
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_limiter_gains;
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_interpol_freq;
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_smoothing_mode;
>> >> + ? ?/** @} */
>> >> + ? ?uint8_t ? ? ? ? ? ?bs_coupling;
>> >
>> >> + ? ?uint8_t ? ? ? ? ? ?k[5]; ///< k0, k1, k2, kx', and kx respectively
>> >
>> > this seems never used as an array, why is it one?
>> >
>>
>> I don't know.
>>
>> kx and kx' are together due to convention
>>
>
>> > also why are so many variables fixed small sizes? this can cause a
>> > speed loss on some architectures, and even x86 with gcc seems to have a
>> > dislike for such things from what ive seen recently in h264 optims
>> >
>>
>> My guess is it was done like this because the SBR context is so large
>> and per channel.
>
> what speed effect does it have to change them to int (except (large) arrays) ?
>

There is a tiny speedup but and channel size only increases by 90 bytes. applied

>
>>
>> >
>> >> + ? ?uint8_t ? ? ? ? ? ?m[2]; ///< M' and M respectively
>> >> + ? ?uint8_t ? ? ? ? ? ?n_master;
>> >> + ? ?SBRData ? ? ? ? ? ?data[2];
>> >> + ? ?uint8_t ? ? ? ? ? ?n[2]; ///< n_low and n_high respectively
>> >> + ? ?uint8_t ? ? ? ? ? ?n_q;
>> >> + ? ?uint8_t ? ? ? ? ? ?n_lim;
>> >> + ? ?uint16_t ? ? ? ? ? f_master[49];
>> >> + ? ?uint16_t ? ? ? ? ? f_tablelow[25];
>> >> + ? ?uint16_t ? ? ? ? ? f_tablehigh[49];
>> >> + ? ?uint16_t ? ? ? ? ? f_tablenoise[6];
>> >> + ? ?uint16_t ? ? ? ? ? f_tablelim[29];
>> > [...]
>> >> + ? ?float ? ? ? ? ? ? ?X_low[32][40][2];
>> >> + ? ?float ? ? ? ? ? ? ?X_high[64][40][2];
>> >> + ? ?DECLARE_ALIGNED(16, float, X)[2][32][64];
>> >> + ? ?float ? ? ? ? ? ? ?alpha0[64][2];
>> >> + ? ?float ? ? ? ? ? ? ?alpha1[64][2];
>> >> + ? ?float ? ? ? ? ? ? ?e_origmapped[7][48];
>> >> + ? ?float ? ? ? ? ? ? ?q_mapped[7][48];
>> >> + ? ?uint8_t ? ? ? ? ? ?s_mapped[7][48];
>> >> + ? ?float ? ? ? ? ? ? ?e_curr[7][48];
>> >> + ? ?float ? ? ? ? ? ? ?q_m[7][48];
>> >> + ? ?float ? ? ? ? ? ? ?s_m[7][48];
>> >
>> > It would tremendously help code readability and thus also me reviewing
>> > if these variables where things that made sense and did not require one
>> > to look _each_ up in a spec that isnt freely available.
>> >
>>
>> I find it much easier when writing the code and fixing bugs if the
>> variables do match the spec rather than having to look across some
>> equivalence table.
>
> So you dont know yourself what these variables mean.
> its just 2 funny letters that are easy to find in the spec.
> the problem with this is that understanding is a prerequesite for writing
> a good implementation. Good both in speed as well as readability.
> With h264 for example this would amount to 5 recursive calls through
> functions instead of x[-1] to get the left pixel. Its clear the aac spec

We already use the trick

> is not written that poorly but still id expect quite a bit of simplifications
> to become obvious if one could look at a piece of code and actually
> understand what the variables used contain instead of having to look that up
> for each in some equivalence table.
>

I added comments explaining some of them

[...]