[FFmpeg-devel] [PATCH] SSE optimization for DCA decoder
Benjamin Larsson
banan
Mon Sep 1 10:12:34 CEST 2008
Alexander E. Patrakov wrote:
> IMHO, it is still too early to do this, because of missed
> high-level "optimizations" (quoted because no further speed gain on
> the "window" operation seems possible). As I said earlier, the funky
> indexing seems to mean either two transforms at once, or maybe simply a
> longer transform than written. In support of this view, here is the
> rewritten (according to
> http://ccrma.stanford.edu/~jos/sasp/Pseudo_QMF_Cosine_Modulation_Filter.html,
> thanks to Benjamin Larsson for the impotrant keywords!) inverse subband
> transform (for the encoder), that still uses naive form of the DCT:
>
> [...]
>
> Note especially these lines:
>
> for (k = 0; k < 32; k++)
> accum[k] = accum[k] - accum[64 + k] - accum[63 - k] + accum[127 - k];
>
> The previous try didn't have these lines, but also, below, had the loop over
> i (inside the loop over band) go from 0 to 127. This time-reversal looks
> suspiciously similar to what we have in the decoder.
> c->band_interpolation[] is not the official table, but something derived
> from it.
>
> BTW, is it an absolute requirement that the decoder uses the raw official
> table for prCoeff[]?
No, anything that is analytically the same is acceptable (with one
exception, changes that introduces numerical instabilities will most
likely be rejected). Good specs should explain how something was
obtained. DTS probably didn't want to say how they got their filter and
thus obfuscated the specs.
> Maybe, for clarity, it should first derive the
> prototype lowpass filter from it, and then use this filter according to the
> definition of a pseudo-QMF cosine modulation filter? Attached are the plots
> of the original data table and the lowpass filter kernel extracted from it,
> for the case of "perfect-reconstruction FIR". I think you can immediately
> get the meaning of the "lowpass" plot, but "official data" is simply a
> strange plot with no obvious meaning.
>
Yeah, it looked quite funky. But can the code be made faster with the
"lowpass" representation ?
MvH
Benjamin Larsson
More information about the ffmpeg-devel
mailing list