[FFmpeg-devel] [PATCH] 'vorbis_residue_decode' optimizations
Loren Merritt
lorenm
Thu Sep 11 06:25:34 CEST 2008
On Wed, 10 Sep 2008, Siarhei Siamashka wrote:
> On Tuesday 09 September 2008, Loren Merritt wrote:
>
>> You could try decoding residual in channel-interleaved order, do that
>> consecutive codebook entries are consecutive in decoded memory. The simd
>> savings might be worth an extra copy to deinterleave afterward.
>
> Do you suggest to deinterleave codebook entries beforehand on header setup
> stage, so that when they are used in resude decode function later, no
> extra 'shufps' SSE instructions would be needed? This might actually work.
Better than nothing, though it doesn't help dim2.
>+ while ((step -= 4) >= 0) {
>+ UPDATE_CACHE(re, gb)
>+ VORBIS_GET_VLC(coffs, re, gb, vlc_table, codebook_nb_bits, codebook_nb_bits_mask, 3, 1)
>+ asm volatile ("movlps 0(%0,%1,8), %%xmm0 \n" : : "r" (codevectors), "r" (coffs));
>+ VORBIS_GET_VLC(coffs, re, gb, vlc_table, codebook_nb_bits, codebook_nb_bits_mask, 3, 0)
>+ asm volatile ("movhps (%0,%1,8), %%xmm0 \n" : : "r" (codevectors), "r" (coffs));
>+ UPDATE_CACHE(re, gb)
>+ VORBIS_GET_VLC(coffs, re, gb, vlc_table, codebook_nb_bits, codebook_nb_bits_mask, 3, 1)
>+ asm volatile ("movlps 0(%0,%1,8), %%xmm1 \n" : : "r" (codevectors), "r" (coffs));
>+ VORBIS_GET_VLC(coffs, re, gb, vlc_table, codebook_nb_bits, codebook_nb_bits_mask, 3, 0)
>+ asm volatile ("movhps (%0,%1,8), %%xmm1 \n" : : "r" (codevectors), "r" (coffs));
>+ asm volatile ("movaps %xmm0, %xmm3 \n");
>+ asm volatile ("shufps $0x88, %xmm1, %xmm0 \n");
>+ asm volatile ("shufps $0xDD, %xmm1, %xmm3 \n");
>+ asm volatile ("movaps 0(%0), %%xmm4 \n" : : "r" (p1));
>+ asm volatile ("movaps 0(%0), %%xmm5 \n" : : "r" (p2));
>+ asm volatile ("addps %xmm0, %xmm4 \n");
>+ asm volatile ("addps %xmm3, %xmm5 \n");
asm volatile ("addps (%0), %%xmm0 \n" : : "r" (p1));
asm volatile ("addps (%0), %%xmm3 \n" : : "r" (p2));
>+ if (step & 2) {
>+ UPDATE_CACHE(re, gb)
>+ VORBIS_GET_VLC(coffs, re, gb, vlc_table, codebook_nb_bits, codebook_nb_bits_mask, 3, 1)
>+ asm volatile ("movlps 0(%0,%1,8), %%xmm0 \n" : : "r" (codevectors), "r" (coffs));
>+ VORBIS_GET_VLC(coffs, re, gb, vlc_table, codebook_nb_bits, codebook_nb_bits_mask, 3, 0)
>+ asm volatile ("movhps (%0,%1,8), %%xmm0 \n" : : "r" (codevectors), "r" (coffs));
>+ asm volatile ("shufps $0xD8, %xmm0, %xmm0 \n");
unpcklps is faster than shufps
--Loren Merritt
More information about the ffmpeg-devel
mailing list