[FFmpeg-devel] [PATCH] aarch64/h26x: optimize sao_band_filter

Zhao Zhili quinkblack at foxmail.com
Tue Apr 29 11:14:37 EEST 2025



> On Apr 29, 2025, at 15:58, Martin Storsjö <martin at martin.st> wrote:
> 
> On Tue, 29 Apr 2025, Zhao Zhili wrote:
> 
>>> On Apr 25, 2025, at 16:25, Martin Storsjö <martin at martin.st> wrote:
>>> On Tue, 15 Apr 2025, Zhao Zhili wrote:
>>>> +        tbx             v3.8b, {v16.16b-v17.16b}, v3.8b
>>> Is there any specific reason for preferring tbx over tbl here? (I know the existing code used tbx.) Without having studied cycle tables, I would expect tbl to maybe be slightly simpler, but perhaps there's no difference (or tbx is faster)?
>> 
>> tbl can be faster. The result is quite impressive. Changed to tbl before push.
>> 
>>                            Before               tbx             tbl
>> hevc_sao_band_8_8_c:          252.3 ( 1.00x)     252.3 ( 1.00x)    252.3 ( 1.00x)
>> hevc_sao_band_8_8_neon:        95.8 ( 2.63x)      61.0 ( 4.14x)     61.0 ( 4.57x)
>> hevc_sao_band_16_8_c:         875.2 ( 1.00x)     864.9 ( 1.00x)    864.9 ( 1.00x)
>> hevc_sao_band_16_8_neon:      317.5 ( 2.76x)     150.0 ( 5.76x)    150.0 ( 6.26x)
>> hevc_sao_band_32_8_c:        3853.5 ( 1.00x)    3871.6 ( 1.00x)   3871.6 ( 1.00x)
>> hevc_sao_band_32_8_neon:     1222.3 ( 3.15x)     550.6 ( 7.03x)    550.6 ( 7.39)
>> hevc_sao_band_48_8_c:        8203.6 ( 1.00x)    8182.6 ( 1.00x)   8182.6 ( 1.00x)
>> hevc_sao_band_48_8_neon:     2685.7 ( 3.05x)    1185.8 ( 6.90x)   1185.8 ( 7.36x)
>> hevc_sao_band_64_8_c:       14023.0 ( 1.00x)   14038.9 ( 1.00x)  14038.9 ( 1.00x)
>> hevc_sao_band_64_8_neon:     4783.2 ( 2.93x)    2078.4 ( 6.75x)   2078.4 ( 7.15x)
> 
> The cycle numbers in the tbl and tbx columns seem to be identical here, while the relative speedup numbers differ - was this some sort of copypaste mistake in preparing the table? (The difference in speedup numbers does seem impressive.)

They are the same on A75, but not on A76/A77/X3.

tbl: 2 cycle for 1 or 2 table register
tbx: 2 cycle for 1 table register, 4 for 2 table register.

The code use 2 table register.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 122049 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20250429/4ab8c03b/attachment.png>
-------------- next part --------------

> 
> // Martin
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".



More information about the ffmpeg-devel mailing list