[Ffmpeg-devel] VP3/Theora Perfection
Michael Niedermayer
michaelni
Tue May 17 13:55:52 CEST 2005
Hi
On Monday 16 May 2005 22:10, Mike Melanson wrote:
> Michael Niedermayer wrote:
[...]
> > * the switch / case mess used for some vlc decoding
>
> Expound. Are you talking about the unpack_token() function? That is
yes, and get_motion_vector_vlc()
> called a lot and perhaps should be inline'd. Otherwise, the actual
> switch/case logic should reduce to a jump table. On2's original code
you dont seem to be aware that jump tables with unpredictable jump targets are
very slow
[...]
> > actually
> > the dequant should be done during bitstream decoding
>
> Why? Dequantization is a parallelizable operation that can be optimized
> with SIMD instructions. That is why it is done at the same time as the
> optimized IDCTs.
i prefer to multiply 2 elements without SIMD over multiplying 64 with SIMD
[...]
> > * mmx.h based asm code (slow due to gcc bugs, and problematic due to
>
> bugs in
>
> > mmx.h)
> >
> >>has MMX and SSE2 optimizations that I can port over when I am confident
> >>that the C-based loop filter works.
> >
> > note, please do not use mmx.h,
>
> Please give me a good reason. I have checked code generated from mmx.h
> against objdump and the generated ASM is correct.
the operand constraints in mmx.h are wrong, for example:
#define mmx_m2ri(op,mem,reg,imm) \
__asm__ __volatile__ (#op " %1, %0, %%" #reg \
: /* nothing */ \
: "X" (mem), "X" (imm))
the speed of mmx.h code depends strongly upon the compiler ...
you cannot use integer instructions or non mmx registers directly
theres no real advantage over asm() style
[...]
--
Michael
More information about the ffmpeg-devel
mailing list