[FFmpeg-devel] [PATCH] move H264 IDCT to yasm
Alexander Strange
astrange
Tue Sep 7 05:31:35 CEST 2010
On Sep 6, 2010, at 5:00 PM, Ronald S. Bultje wrote:
> Hi,
>
> this patch moves H264 IDCT (the LGPL part) to yasm. Performance for
> most loopy parts is improved quite a bit because gcc is completely
> retarded when it comes to setting up loops (I'm not joking here), some
> up to 50%. Performance for one particular function (intra16_mmx2) is
> mildly worse (a few cycles) and I don't quite understand why, the code
> is identical. This might be related to alignment (gcc aligns the parts
> that it jmps to using nops, I don't yet know how to do that in yasm),
> otherwise I don't really know. Let me know if you want detailed
> performance statistics for each function.
>
> Ronald
> <yamsify-h264_idct.patch>
> +cglobal h264_idct_add16intra_mmx2, 5, 7, 0
> + xor r5, r5
> +.nextblock
> +%ifdef PIC;f660-f7f9=199=256+144+9=409 (mine), theirs=1e70-2034=
What's with the comment?
More information about the ffmpeg-devel
mailing list