[FFmpeg-devel] [PATCH 2/2] aarch64: h264qpel: Do vertical filtering without transposing

Martin Storsjö martin at martin.st
Mon Oct 18 14:34:43 EEST 2021


On Fri, 3 Sep 2021, Martin Storsjö wrote:

> This gives rather big speedups on these functions:
>
> Before:
> put_h264_qpel_8_mc01_8_neon:     241.0   131.5   138.7
> put_h264_qpel_8_mc02_8_neon:     214.7   121.2   127.5
> put_h264_qpel_8_mc03_8_neon:     242.5   131.2   135.7
> put_h264_qpel_8_mc11_8_neon:     421.2   218.7   251.0
> put_h264_qpel_8_mc12_8_neon:     878.0   509.5   537.5
> put_h264_qpel_8_mc13_8_neon:     423.7   217.0   252.0
> put_h264_qpel_8_mc21_8_neon:     858.2   479.5   514.0
> put_h264_qpel_8_mc22_8_neon:     649.7   385.2   403.0
> put_h264_qpel_8_mc23_8_neon:     860.2   476.5   517.7
> put_h264_qpel_8_mc31_8_neon:     437.2   219.5   252.5
> put_h264_qpel_8_mc32_8_neon:     892.5   510.5   546.0
> put_h264_qpel_8_mc33_8_neon:     438.2   218.5   257.0
> put_h264_qpel_16_mc01_8_neon:    944.2   509.7   546.7
> put_h264_qpel_16_mc02_8_neon:    878.7   469.5   509.7
> put_h264_qpel_16_mc03_8_neon:    945.7   510.7   557.0
> put_h264_qpel_16_mc11_8_neon:   1663.2   858.5   979.5
> put_h264_qpel_16_mc12_8_neon:   3510.2  2027.7  2112.7
> put_h264_qpel_16_mc13_8_neon:   1664.7   857.5   980.5
> put_h264_qpel_16_mc21_8_neon:   3366.2  1928.5  2030.5
> put_h264_qpel_16_mc22_8_neon:   2584.7  1514.7  1590.2
> put_h264_qpel_16_mc23_8_neon:   3367.7  1927.7  2035.0
> put_h264_qpel_16_mc31_8_neon:   1716.7   849.7   997.0
> put_h264_qpel_16_mc32_8_neon:   3564.0  2044.2  3835.2
> put_h264_qpel_16_mc33_8_neon:   1717.7   863.0   989.5
>
> After:
> put_h264_qpel_8_mc01_8_neon:     136.0    73.7    76.0
> put_h264_qpel_8_mc02_8_neon:     108.7    65.0    64.0
> put_h264_qpel_8_mc03_8_neon:     137.5    72.7    73.0
> put_h264_qpel_8_mc11_8_neon:     316.2   159.0   188.5
> put_h264_qpel_8_mc12_8_neon:     653.0   375.5   384.7
> put_h264_qpel_8_mc13_8_neon:     318.7   165.5   189.5
> put_h264_qpel_8_mc21_8_neon:     739.2   385.7   432.5
> put_h264_qpel_8_mc22_8_neon:     530.7   295.5   309.5
> put_h264_qpel_8_mc23_8_neon:     741.2   393.7   421.0
> put_h264_qpel_8_mc31_8_neon:     332.2   162.5   190.0
> put_h264_qpel_8_mc32_8_neon:     667.5   378.2   390.5
> put_h264_qpel_8_mc33_8_neon:     332.7   166.5   195.5
> put_h264_qpel_16_mc01_8_neon:    524.2   285.2   294.0
> put_h264_qpel_16_mc02_8_neon:    454.7   252.2   250.2
> put_h264_qpel_16_mc03_8_neon:    525.7   286.0   283.0
> put_h264_qpel_16_mc11_8_neon:   1243.2   630.7   726.7
> put_h264_qpel_16_mc12_8_neon:   2610.2  1479.7  1481.2
> put_h264_qpel_16_mc13_8_neon:   1250.5   631.7   727.7
> put_h264_qpel_16_mc21_8_neon:   2890.2  1571.2  1679.7
> put_h264_qpel_16_mc22_8_neon:   2108.7  1177.5  1223.5
> put_h264_qpel_16_mc23_8_neon:   2891.7  1578.7  1667.7
> put_h264_qpel_16_mc31_8_neon:   1296.7   630.5   752.5
> put_h264_qpel_16_mc32_8_neon:   2664.0  1483.2  1503.5
> put_h264_qpel_16_mc33_8_neon:   1297.7   632.5   747.2
>
> I.e. overall a 20%-60% reduction in runtime of these
> functions.
> ---
> libavcodec/aarch64/h264qpel_neon.S | 111 +++++++++++++++--------------
> 1 file changed, 56 insertions(+), 55 deletions(-)

Pushed.

// Martin


More information about the ffmpeg-devel mailing list