[FFmpeg-devel] [PATCH] aarch64/h26x: optimize sao_band_filter
Zhao Zhili
quinkblack at foxmail.com
Tue Apr 29 11:14:37 EEST 2025
> On Apr 29, 2025, at 15:58, Martin Storsjö <martin at martin.st> wrote:
>
> On Tue, 29 Apr 2025, Zhao Zhili wrote:
>
>>> On Apr 25, 2025, at 16:25, Martin Storsjö <martin at martin.st> wrote:
>>> On Tue, 15 Apr 2025, Zhao Zhili wrote:
>>>> + tbx v3.8b, {v16.16b-v17.16b}, v3.8b
>>> Is there any specific reason for preferring tbx over tbl here? (I know the existing code used tbx.) Without having studied cycle tables, I would expect tbl to maybe be slightly simpler, but perhaps there's no difference (or tbx is faster)?
>>
>> tbl can be faster. The result is quite impressive. Changed to tbl before push.
>>
>> Before tbx tbl
>> hevc_sao_band_8_8_c: 252.3 ( 1.00x) 252.3 ( 1.00x) 252.3 ( 1.00x)
>> hevc_sao_band_8_8_neon: 95.8 ( 2.63x) 61.0 ( 4.14x) 61.0 ( 4.57x)
>> hevc_sao_band_16_8_c: 875.2 ( 1.00x) 864.9 ( 1.00x) 864.9 ( 1.00x)
>> hevc_sao_band_16_8_neon: 317.5 ( 2.76x) 150.0 ( 5.76x) 150.0 ( 6.26x)
>> hevc_sao_band_32_8_c: 3853.5 ( 1.00x) 3871.6 ( 1.00x) 3871.6 ( 1.00x)
>> hevc_sao_band_32_8_neon: 1222.3 ( 3.15x) 550.6 ( 7.03x) 550.6 ( 7.39)
>> hevc_sao_band_48_8_c: 8203.6 ( 1.00x) 8182.6 ( 1.00x) 8182.6 ( 1.00x)
>> hevc_sao_band_48_8_neon: 2685.7 ( 3.05x) 1185.8 ( 6.90x) 1185.8 ( 7.36x)
>> hevc_sao_band_64_8_c: 14023.0 ( 1.00x) 14038.9 ( 1.00x) 14038.9 ( 1.00x)
>> hevc_sao_band_64_8_neon: 4783.2 ( 2.93x) 2078.4 ( 6.75x) 2078.4 ( 7.15x)
>
> The cycle numbers in the tbl and tbx columns seem to be identical here, while the relative speedup numbers differ - was this some sort of copypaste mistake in preparing the table? (The difference in speedup numbers does seem impressive.)
They are the same on A75, but not on A76/A77/X3.
tbl: 2 cycle for 1 or 2 table register
tbx: 2 cycle for 1 table register, 4 for 2 table register.
The code use 2 table register.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 122049 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20250429/4ab8c03b/attachment.png>
-------------- next part --------------
>
> // Martin
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list