[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms
Christophe GISQUET
christophe.gisquet
Mon Jan 14 21:12:40 CET 2008
Hi,
Ivan Kalvachev a ?crit :
> - Why you choose to transpose at all. Just to save time and effort?
Time, effort, code size and possibly speed. It might be possible to
write code specific to horizontal pass, but code and/or constant may
need to be different for each line.
In addition, some code (h263_h_loop_filter_mmx) in dsputils_mmx does
just that: transpose before processing. Granted, they are not all about
fwd/inv transforms, so it might be a little different.
But just look at the transforms in h264dsp_mmx.c
Using the scantable transpose 'trick', some transposes are not needed
but still there are some done. And a 1d dct is used in a loop. So,
except for the unneeded initial transpose, the code is very similar.
> It is usual to have separate version of SIMD depending if they work on
> row or columns. The row and column stages are different and you pass
> the differences as parameters.
Here the problem is about loading registers mainly. For instance, the
vc1 qpel code use the same function with said differing parameters. Here
I don't think it's optimal, and Micha?l not criticizing this point goes
in that direction (famous last words before ignition).
> - Am I wrong or you do all the math in 16 bit signed saturation mode?
> According to vc1 draft in first stage the input is in the range
> [-2048;2047] the multiply constants are in range [-16;16], this makes
> range [-32768;32768] per multiply and you can have 8 of them.
> Or multiply constants in range [-22;22], that make range
> [-45056;45056] per multiply and you can have 4 of them.
> In the second phase the input range is doubled to [-4096,4095]
>
> Are you sure your transforms produce the same result as their _c equivalents?
I did test bit exactness (with win32 dll output) but albeit on few
sequences. Everything was perfect.
The reference I found said it could be done on 16 bits maths. Maybe it
needs a bias to correct, but as output is usually in the range
[-128;127], it's pretty symmetrical. However, indeed, it would be better
if proof could be given.
> - Have you seen how other IDCT optimizations work? I may be wrong but
> vc1 transformations look like IDCT with quite simplified (smaller)
> coefficients.
See my comments about H.264. Following Micha?l's mail, I did more
homework and found how H.264 handles it. I'll start a new thread on that
specific topic.
Best regards,
Christophe GISQUET
More information about the ffmpeg-devel
mailing list