[FFmpeg-devel] [PATCH] SSE optimization for DCA decoder
Alexander E. Patrakov
patrakov
Mon Sep 1 09:36:16 CEST 2008
Michael Niedermayer wrote:
> nice, but as you probably already know, my highlevel optimizations
> broke your patch.
>
> If you want to update it, also look at ff_mpa_synth_filter() which
> performs the same windowing operation but with a quite different
> implementation, i do not know which way is more efficient in SIMD,
> actually i dont know which is better for C either ...
IMHO, it is still too early to do this, because of missed
high-level "optimizations" (quoted because no further speed gain on
the "window" operation seems possible). As I said earlier, the funky
indexing seems to mean either two transforms at once, or maybe simply a
longer transform than written. In support of this view, here is the
rewritten (according to
http://ccrma.stanford.edu/~jos/sasp/Pseudo_QMF_Cosine_Modulation_Filter.html,
thanks to Benjamin Larsson for the impotrant keywords!) inverse subband
transform (for the encoder), that still uses naive form of the DCT:
static void dcaenc_subband_transform(dcaenc_context c, const int32_t *input)
{
int ch, subs, i, k, j;
for (ch = 0; ch < c->fullband_channels; ch++) {
/* History is copied because it is also needed for PSY */
int32_t hist[512];
int hist_start = 0;
for (i = 0; i < 512; i++)
hist[i] = c->pcm_history[i][ch];
for (subs = 0; subs < 32; subs++) {
int32_t accum[128];
int32_t resp;
int band;
/* Calculate the convolutions at once */
for (i = 0; i < 128; i++)
accum[i] = 0;
for (k = 16, i = hist_start, j = 0; i < 512; k = (k
+ 1) & 127, i++, j++)
accum[k] += mul32(hist[i],
c->band_interpolation[j]);
for (i = 0; i < hist_start; k = (k + 1) & 127, i++,
j++)
accum[k] += mul32(hist[i],
c->band_interpolation[j]);
for (k = 0; k < 32; k++)
accum[k] = accum[k] - accum[64 + k] -
accum[63 - k] + accum[127 - k];
for (band = 0; band < 32; band++) {
resp = 0;
for (i = 0; i < 32; i++) {
int s = (2 * band + 1) * (2 * i +
1);
resp += mul32(accum[i], cos_t(s <<
3)) >> 3;
}
c->subband_samples[subs][band][ch] = ((band
+ 1) & 2) ? (-resp) : resp;
}
/* Copy in 32 new samples from input */
for (i = 0; i < 32; i++)
hist[i + hist_start] = *pcm_sample(c, input,
subs * 32 + i, ch);
hist_start = (hist_start + 32) & 511;
}
}
}
Note especially these lines:
for (k = 0; k < 32; k++)
accum[k] = accum[k] - accum[64 + k] - accum[63 - k] + accum[127 - k];
The previous try didn't have these lines, but also, below, had the loop over
i (inside the loop over band) go from 0 to 127. This time-reversal looks
suspiciously similar to what we have in the decoder.
c->band_interpolation[] is not the official table, but something derived
from it.
BTW, is it an absolute requirement that the decoder uses the raw official
table for prCoeff[]? Maybe, for clarity, it should first derive the
prototype lowpass filter from it, and then use this filter according to the
definition of a pseudo-QMF cosine modulation filter? Attached are the plots
of the original data table and the lowpass filter kernel extracted from it,
for the case of "perfect-reconstruction FIR". I think you can immediately
get the meaning of the "lowpass" plot, but "official data" is simply a
strange plot with no obvious meaning.
--
Alexander E. Patrakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lowpass.png
Type: image/png
Size: 3903 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080901/9ad29b47/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: official-data.png
Type: image/png
Size: 4110 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080901/9ad29b47/attachment-0001.png>
More information about the ffmpeg-devel
mailing list