[FFmpeg-devel] [PATCH] vp9/x86: 16px MC functions (64bit only).
Clément Bœsch
u at pkh.me
Wed Jan 15 14:33:20 CET 2014
On Thu, Dec 26, 2013 at 09:05:37PM -0500, Ronald S. Bultje wrote:
> Cycle counts for large MCs (old -> new on ped1080p.webm, mx!=0&&my!=0):
decicyle?
> 16x8: 876 -> 870 (0.7%)
> 16x16: 1444 -> 1435 (0.7%)
> 16x32: 2784 -> 2748 (1.3%)
> 32x16: 2455 -> 2349 (4.5%)
> 32x32: 4641 -> 4084 (13.6%)
> 32x64: 9200 -> 7834 (17.4%)
> 64x32: 8980 -> 7197 (24.8%)
> 64x64: 17330 -> 13796 (25.6%)
> Total decoding time goes from 9.326sec to 9.182sec.
> ---
> libavcodec/x86/vp9dsp_init.c | 5 ++
> libavcodec/x86/vp9mc.asm | 122 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 127 insertions(+)
>
[...]
> +%if ARCH_X86_64
> +
> +%macro filter_vx2_fn 1
> +%assign %%px mmsize
> +cglobal %1_8tap_1d_v_ %+ %%px, 6, 8, 14, dst, dstride, src, sstride, h, filtery, src4, sstride3
> + sub srcq, sstrideq
> + lea sstride3q, [sstrideq*3]
> + sub srcq, sstrideq
> + mova m13, [pw_256]
> + sub srcq, sstrideq
> + mova m8, [filteryq+ 0]
> + lea src4q, [srcq+sstrideq*4]
> + mova m9, [filteryq+16]
> + mova m10, [filteryq+32]
> + mova m11, [filteryq+48]
Untested, but wouldn't it be simpler to have:
lea sstride3q, [sstrideq*3]
lea src4q, [srcq+sstrideq]
sub srcq, sstride3q
mova m13, [pw_256]
mova m8, [filteryq+ 0]
mova m9, [filteryq+16]
mova m10, [filteryq+32]
mova m11, [filteryq+48]
?
[...]
Rest LGTM
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140115/2f9add8a/attachment.asc>
More information about the ffmpeg-devel
mailing list