[FFmpeg-devel] [PATCH 2/2] libavutil: add bmi2 optimized av_zhb
Michael Niedermayer
michaelni at gmx.at
Tue Mar 17 12:53:22 CET 2015
On Tue, Mar 17, 2015 at 01:08:06AM -0300, James Almer wrote:
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
> GCC apparently can't generate a bzhi instruction on its own from the c version, so
> here's a custom implementation.
>
> Before:
>
> gcc -O3
> <av_zhb_c>:
> 0: 89 f1 mov ecx,esi
> 2: ba 01 00 00 00 mov edx,0x1
> 7: d3 e2 shl edx,cl
> 9: 83 ea 01 sub edx,0x1
> c: 89 d0 mov eax,edx
> e: 21 f8 and eax,edi
> 10: c3 ret
>
> gcc -mbmi2 -O3
> <av_zhb_c>:
> 0: ba 01 00 00 00 mov edx,0x1
> 5: c4 e2 49 f7 d2 shlx edx,edx,esi
> a: 8d 42 ff lea eax,[rdx-0x1]
> d: 21 f8 and eax,edi
> f: c3 ret
>
> After:
>
> gcc -mbmi2 -O3
> <av_zhb_bmi2>:
> 0: c4 e2 48 f5 c7 bzhi eax,edi,esi
> 5: c3 ret
>
> The non-bmi2 example is a bit bloated with movs to have values in ecx (needed for
> shl) and eax (ret value) since, unlike the actual function, it was not inlined.
> Still, best case scenario is mov + shl + sub/dec/lea + and versus a single bzhi
> when p is not a constant.
orthogonal to this patch, you or someone might want to submit a patch
to gcc to make it autogenerate this optimization
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The bravest are surely those who have the clearest vision
of what is before them, glory and danger alike, and yet
notwithstanding go out to meet it. -- Thucydides
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150317/753b299e/attachment.asc>
More information about the ffmpeg-devel
mailing list