[FFmpeg-devel] [PATCH] vp9/x86: 16px MC functions (64bit only).

Clément Bœsch u at pkh.me
Wed Jan 15 14:33:20 CET 2014


On Thu, Dec 26, 2013 at 09:05:37PM -0500, Ronald S. Bultje wrote:
> Cycle counts for large MCs (old -> new on ped1080p.webm, mx!=0&&my!=0):

decicyle?

> 16x8:    876 ->   870  (0.7%)
> 16x16:  1444 ->  1435  (0.7%)
> 16x32:  2784 ->  2748  (1.3%)
> 32x16:  2455 ->  2349  (4.5%)
> 32x32:  4641 ->  4084 (13.6%)
> 32x64:  9200 ->  7834 (17.4%)
> 64x32:  8980 ->  7197 (24.8%)
> 64x64: 17330 -> 13796 (25.6%)
> Total decoding time goes from 9.326sec to 9.182sec.
> ---
>  libavcodec/x86/vp9dsp_init.c |   5 ++
>  libavcodec/x86/vp9mc.asm     | 122 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 127 insertions(+)
> 
[...]
> +%if ARCH_X86_64
> +
> +%macro filter_vx2_fn 1
> +%assign %%px mmsize
> +cglobal %1_8tap_1d_v_ %+ %%px, 6, 8, 14, dst, dstride, src, sstride, h, filtery, src4, sstride3

> +    sub       srcq, sstrideq
> +    lea  sstride3q, [sstrideq*3]
> +    sub       srcq, sstrideq
> +    mova       m13, [pw_256]
> +    sub       srcq, sstrideq
> +    mova        m8, [filteryq+ 0]
> +    lea      src4q, [srcq+sstrideq*4]
> +    mova        m9, [filteryq+16]
> +    mova       m10, [filteryq+32]
> +    mova       m11, [filteryq+48]

Untested, but wouldn't it be simpler to have:

    lea  sstride3q, [sstrideq*3]
    lea      src4q, [srcq+sstrideq]
    sub       srcq, sstride3q
    mova       m13, [pw_256]
    mova        m8, [filteryq+ 0]
    mova        m9, [filteryq+16]
    mova       m10, [filteryq+32]
    mova       m11, [filteryq+48]

?

[...]

Rest LGTM

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140115/2f9add8a/attachment.asc>


More information about the ffmpeg-devel mailing list