[FFmpeg-devel] [ping] [PATCH] mmx implementation of vc-1 inverse transformations
Yuriy Kaminskiy
yumkam
Thu Sep 30 19:09:49 CEST 2010
Yuriy Kaminskiy wrote:
> Hello!
>
> I've noticed old and forgotten patch series by Victor Pollex
> (http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2008-July/050503.html),
> forward-ported it to current ffmpeg, it seems gives 12%-20% faster decoding
> (benchmarked with ffmpeg [...] -f yuv4mpeg /dev/null, verified with -f framecrc).
>
> Well, not everything good, as sse2 variant (at least on my machine - amd
> x2-4850e/i386/gcc-4.1.2; note that x264 thinks this is SSE2_Slow) is not faster,
> or even slightly slower than mmx, so, maybe, 40_*.patch should be skipped for
> now (strangely, with START|STOP_TIMER, it looks slightly faster with sse2; but
> according to `time ./ffmpeg [...] /dev/null` it is actually slightly slower;
> maybe, i-cache effects?).
>
> time ./ffmpeg -i file.mkv -f yuv4mpegpipe /dev/null
> (faster-in-three-run)
> unpatched:41.903
> c: 41.599
> sse2_all: 34.202
> sse2: 34.198
> mmx: 33.946
>
> Original series also altered _c version for transposed variant, but as there are
> already optimized ppc/altivec variant, and (!res_fasttx) branch was broken by
> original series (used wrong offset in vc1_decode_p_block), so I've considered to
> be easier to drop that and just use _transposed flags.
> I've also fixed problematic asm arguments: s/(0x\d\d)%0/$1(%0)/.
Doh. Some of this kind somehow slipped (I was totally sure I've replaced all
them long time ago :-|). Fixed version attached.
>
[...]
> + TRANSFORM_4X8_COL_H2
> + (
> + q,q,
> + 0x00%3,0x10%3,0x20%3,0x30%3,0x40%3,0x50%3,0x60%3,0x70%3,
> + %%mm0,%%mm1,%%mm2,%%mm3,%%mm4,%%mm5,%%mm6,%%mm7,
> + %4
> + )
[...]
> + : "+r"(dest)
> + : "r"((x86_reg)linesize), "r"((x86_reg)linesize*3), "m"(temp[0]), "m"(ff_pw_64)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 30_vc1dsp_mmx-5.patch
Type: text/x-diff
Size: 33377 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100930/ec929eb0/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 40_vc1dsp_sse2-4.patch
Type: text/x-diff
Size: 9505 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100930/ec929eb0/attachment-0001.patch>
More information about the ffmpeg-devel
mailing list