[FFmpeg-devel] [PATCH 2/2] aarch64: h264qpel: Do vertical filtering without transposing
Martin Storsjö
martin at martin.st
Mon Oct 18 14:34:43 EEST 2021
On Fri, 3 Sep 2021, Martin Storsjö wrote:
> This gives rather big speedups on these functions:
>
> Before:
> put_h264_qpel_8_mc01_8_neon: 241.0 131.5 138.7
> put_h264_qpel_8_mc02_8_neon: 214.7 121.2 127.5
> put_h264_qpel_8_mc03_8_neon: 242.5 131.2 135.7
> put_h264_qpel_8_mc11_8_neon: 421.2 218.7 251.0
> put_h264_qpel_8_mc12_8_neon: 878.0 509.5 537.5
> put_h264_qpel_8_mc13_8_neon: 423.7 217.0 252.0
> put_h264_qpel_8_mc21_8_neon: 858.2 479.5 514.0
> put_h264_qpel_8_mc22_8_neon: 649.7 385.2 403.0
> put_h264_qpel_8_mc23_8_neon: 860.2 476.5 517.7
> put_h264_qpel_8_mc31_8_neon: 437.2 219.5 252.5
> put_h264_qpel_8_mc32_8_neon: 892.5 510.5 546.0
> put_h264_qpel_8_mc33_8_neon: 438.2 218.5 257.0
> put_h264_qpel_16_mc01_8_neon: 944.2 509.7 546.7
> put_h264_qpel_16_mc02_8_neon: 878.7 469.5 509.7
> put_h264_qpel_16_mc03_8_neon: 945.7 510.7 557.0
> put_h264_qpel_16_mc11_8_neon: 1663.2 858.5 979.5
> put_h264_qpel_16_mc12_8_neon: 3510.2 2027.7 2112.7
> put_h264_qpel_16_mc13_8_neon: 1664.7 857.5 980.5
> put_h264_qpel_16_mc21_8_neon: 3366.2 1928.5 2030.5
> put_h264_qpel_16_mc22_8_neon: 2584.7 1514.7 1590.2
> put_h264_qpel_16_mc23_8_neon: 3367.7 1927.7 2035.0
> put_h264_qpel_16_mc31_8_neon: 1716.7 849.7 997.0
> put_h264_qpel_16_mc32_8_neon: 3564.0 2044.2 3835.2
> put_h264_qpel_16_mc33_8_neon: 1717.7 863.0 989.5
>
> After:
> put_h264_qpel_8_mc01_8_neon: 136.0 73.7 76.0
> put_h264_qpel_8_mc02_8_neon: 108.7 65.0 64.0
> put_h264_qpel_8_mc03_8_neon: 137.5 72.7 73.0
> put_h264_qpel_8_mc11_8_neon: 316.2 159.0 188.5
> put_h264_qpel_8_mc12_8_neon: 653.0 375.5 384.7
> put_h264_qpel_8_mc13_8_neon: 318.7 165.5 189.5
> put_h264_qpel_8_mc21_8_neon: 739.2 385.7 432.5
> put_h264_qpel_8_mc22_8_neon: 530.7 295.5 309.5
> put_h264_qpel_8_mc23_8_neon: 741.2 393.7 421.0
> put_h264_qpel_8_mc31_8_neon: 332.2 162.5 190.0
> put_h264_qpel_8_mc32_8_neon: 667.5 378.2 390.5
> put_h264_qpel_8_mc33_8_neon: 332.7 166.5 195.5
> put_h264_qpel_16_mc01_8_neon: 524.2 285.2 294.0
> put_h264_qpel_16_mc02_8_neon: 454.7 252.2 250.2
> put_h264_qpel_16_mc03_8_neon: 525.7 286.0 283.0
> put_h264_qpel_16_mc11_8_neon: 1243.2 630.7 726.7
> put_h264_qpel_16_mc12_8_neon: 2610.2 1479.7 1481.2
> put_h264_qpel_16_mc13_8_neon: 1250.5 631.7 727.7
> put_h264_qpel_16_mc21_8_neon: 2890.2 1571.2 1679.7
> put_h264_qpel_16_mc22_8_neon: 2108.7 1177.5 1223.5
> put_h264_qpel_16_mc23_8_neon: 2891.7 1578.7 1667.7
> put_h264_qpel_16_mc31_8_neon: 1296.7 630.5 752.5
> put_h264_qpel_16_mc32_8_neon: 2664.0 1483.2 1503.5
> put_h264_qpel_16_mc33_8_neon: 1297.7 632.5 747.2
>
> I.e. overall a 20%-60% reduction in runtime of these
> functions.
> ---
> libavcodec/aarch64/h264qpel_neon.S | 111 +++++++++++++++--------------
> 1 file changed, 56 insertions(+), 55 deletions(-)
Pushed.
// Martin
More information about the ffmpeg-devel
mailing list