[FFmpeg-devel] [PATCH] SSE optimization for DCA decoder

Mon Sep 1 09:36:16 CEST 2008

Michael Niedermayer wrote:

> nice, but as you probably already know, my highlevel optimizations
> broke your patch.
> 
> If you want to update it, also look at ff_mpa_synth_filter() which
> performs the same windowing operation but with a quite different
> implementation, i do not know which way is more efficient in SIMD,
> actually i dont know which is better for C either ...

IMHO, it is still too early to do this, because of missed
high-level "optimizations" (quoted because no further speed gain on
the "window" operation seems possible). As I said earlier, the funky
indexing seems to mean either two transforms at once, or maybe simply a
longer transform than written. In support of this view, here is the
rewritten (according to
http://ccrma.stanford.edu/~jos/sasp/Pseudo_QMF_Cosine_Modulation_Filter.html,
thanks to Benjamin Larsson for the impotrant keywords!) inverse subband
transform (for the encoder), that still uses naive form of the DCT:

static void dcaenc_subband_transform(dcaenc_context c, const int32_t *input)
{
        int ch, subs, i, k, j;

        for (ch = 0; ch < c->fullband_channels; ch++) {
                /* History is copied because it is also needed for PSY */
                int32_t hist[512];
                int hist_start = 0;

                for (i = 0; i < 512; i++)
                        hist[i] = c->pcm_history[i][ch];

                for (subs = 0; subs < 32; subs++) {
                        int32_t accum[128];
                        int32_t resp;
                        int band;

                        /* Calculate the convolutions at once */
                        for (i = 0; i < 128; i++)
                                accum[i] = 0;

                        for (k = 16, i = hist_start, j = 0; i < 512; k = (k
+ 1) & 127, i++, j++)
                                accum[k] += mul32(hist[i],
c->band_interpolation[j]);
                        for (i = 0; i < hist_start; k = (k + 1) & 127, i++,
j++)
                                accum[k] += mul32(hist[i],
c->band_interpolation[j]);

                        for (k = 0; k < 32; k++)
                                accum[k] = accum[k] - accum[64 + k] -
accum[63 - k] + accum[127 - k];

                        for (band = 0; band < 32; band++) {
                                resp = 0;
                                for (i = 0; i < 32; i++) {
                                        int s = (2 * band + 1) * (2 * i +
1);
                                        resp += mul32(accum[i], cos_t(s <<
3)) >> 3;
                                }

                                c->subband_samples[subs][band][ch] = ((band
+ 1) & 2) ? (-resp) : resp;
                        }

                        /* Copy in 32 new samples from input */
                        for (i = 0; i < 32; i++)
                                hist[i + hist_start] = *pcm_sample(c, input,
subs * 32 + i, ch);

                        hist_start = (hist_start + 32) & 511;
                }
        }
}

Note especially these lines:

for (k = 0; k < 32; k++)
  accum[k] = accum[k] - accum[64 + k] - accum[63 - k] + accum[127 - k];

The previous try didn't have these lines, but also, below, had the loop over
i (inside the loop over band) go from 0 to 127. This time-reversal looks
suspiciously similar to what we have in the decoder.
c->band_interpolation[] is not the official table, but something derived
from it.

BTW, is it an absolute requirement that the decoder uses the raw official
table for prCoeff[]? Maybe, for clarity, it should first derive the
prototype lowpass filter from it, and then use this filter according to the
definition of a pseudo-QMF cosine modulation filter? Attached are the plots
of the original data table and the lowpass filter kernel extracted from it,
for the case of "perfect-reconstruction FIR". I think you can immediately
get the meaning of the "lowpass" plot, but "official data" is simply a
strange plot with no obvious meaning.

-- 
Alexander E. Patrakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lowpass.png
Type: image/png
Size: 3903 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080901/9ad29b47/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: official-data.png
Type: image/png
Size: 4110 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080901/9ad29b47/attachment-0001.png>