[FFmpeg-devel] [PATCH] aarch64/h26x: optimize sao_band_filter
Martin Storsjö
martin at martin.st
Tue Apr 29 10:58:57 EEST 2025
On Tue, 29 Apr 2025, Zhao Zhili wrote:
>> On Apr 25, 2025, at 16:25, Martin Storsjö <martin at martin.st> wrote:
>>
>> On Tue, 15 Apr 2025, Zhao Zhili wrote:
>>
>>
>>> + tbx v3.8b, {v16.16b-v17.16b}, v3.8b
>>
>> Is there any specific reason for preferring tbx over tbl here? (I know the existing code used tbx.) Without having studied cycle tables, I would expect tbl to maybe be slightly simpler, but perhaps there's no difference (or tbx is faster)?
>
> tbl can be faster. The result is quite impressive. Changed to tbl before push.
>
> Before tbx tbl
> hevc_sao_band_8_8_c: 252.3 ( 1.00x) 252.3 ( 1.00x) 252.3 ( 1.00x)
> hevc_sao_band_8_8_neon: 95.8 ( 2.63x) 61.0 ( 4.14x) 61.0 ( 4.57x)
> hevc_sao_band_16_8_c: 875.2 ( 1.00x) 864.9 ( 1.00x) 864.9 ( 1.00x)
> hevc_sao_band_16_8_neon: 317.5 ( 2.76x) 150.0 ( 5.76x) 150.0 ( 6.26x)
> hevc_sao_band_32_8_c: 3853.5 ( 1.00x) 3871.6 ( 1.00x) 3871.6 ( 1.00x)
> hevc_sao_band_32_8_neon: 1222.3 ( 3.15x) 550.6 ( 7.03x) 550.6 ( 7.39)
> hevc_sao_band_48_8_c: 8203.6 ( 1.00x) 8182.6 ( 1.00x) 8182.6 ( 1.00x)
> hevc_sao_band_48_8_neon: 2685.7 ( 3.05x) 1185.8 ( 6.90x) 1185.8 ( 7.36x)
> hevc_sao_band_64_8_c: 14023.0 ( 1.00x) 14038.9 ( 1.00x) 14038.9 ( 1.00x)
> hevc_sao_band_64_8_neon: 4783.2 ( 2.93x) 2078.4 ( 6.75x) 2078.4 ( 7.15x)
The cycle numbers in the tbl and tbx columns seem to be identical here,
while the relative speedup numbers differ - was this some sort of
copypaste mistake in preparing the table? (The difference in speedup
numbers does seem impressive.)
// Martin
More information about the ffmpeg-devel
mailing list