[FFmpeg-devel] [PATCH] aarch64/h26x: optimize sao_band_filter

Martin Storsjö martin at martin.st
Tue Apr 29 10:58:57 EEST 2025


On Tue, 29 Apr 2025, Zhao Zhili wrote:

>> On Apr 25, 2025, at 16:25, Martin Storsjö <martin at martin.st> wrote:
>> 
>> On Tue, 15 Apr 2025, Zhao Zhili wrote:
>> 
>> 
>>> +        tbx             v3.8b, {v16.16b-v17.16b}, v3.8b
>> 
>> Is there any specific reason for preferring tbx over tbl here? (I know the existing code used tbx.) Without having studied cycle tables, I would expect tbl to maybe be slightly simpler, but perhaps there's no difference (or tbx is faster)?
>
> tbl can be faster. The result is quite impressive. Changed to tbl before push.
>
>                             Before               tbx             tbl
> hevc_sao_band_8_8_c:          252.3 ( 1.00x)     252.3 ( 1.00x)    252.3 ( 1.00x)
> hevc_sao_band_8_8_neon:        95.8 ( 2.63x)      61.0 ( 4.14x)     61.0 ( 4.57x)
> hevc_sao_band_16_8_c:         875.2 ( 1.00x)     864.9 ( 1.00x)    864.9 ( 1.00x)
> hevc_sao_band_16_8_neon:      317.5 ( 2.76x)     150.0 ( 5.76x)    150.0 ( 6.26x)
> hevc_sao_band_32_8_c:        3853.5 ( 1.00x)    3871.6 ( 1.00x)   3871.6 ( 1.00x)
> hevc_sao_band_32_8_neon:     1222.3 ( 3.15x)     550.6 ( 7.03x)    550.6 ( 7.39)
> hevc_sao_band_48_8_c:        8203.6 ( 1.00x)    8182.6 ( 1.00x)   8182.6 ( 1.00x)
> hevc_sao_band_48_8_neon:     2685.7 ( 3.05x)    1185.8 ( 6.90x)   1185.8 ( 7.36x)
> hevc_sao_band_64_8_c:       14023.0 ( 1.00x)   14038.9 ( 1.00x)  14038.9 ( 1.00x)
> hevc_sao_band_64_8_neon:     4783.2 ( 2.93x)    2078.4 ( 6.75x)   2078.4 ( 7.15x)

The cycle numbers in the tbl and tbx columns seem to be identical here, 
while the relative speedup numbers differ - was this some sort of 
copypaste mistake in preparing the table? (The difference in speedup 
numbers does seem impressive.)

// Martin


More information about the ffmpeg-devel mailing list