[FFmpeg-devel] [PATCH] x86/intmath: allow the source operand for icc/icl ff_ctzll to be a memory location
Matt Oliver
protogonoi at gmail.com
Sun Oct 25 17:39:42 CET 2015
On 26 October 2015 at 00:43, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
> On Sun, Oct 25, 2015 at 12:30 PM, Matt Oliver <protogonoi at gmail.com>
> wrote:
> > On 25 October 2015 at 22:16, Hendrik Leppkes <h.leppkes at gmail.com>
> wrote:
> >
> >> On Sun, Oct 25, 2015 at 11:26 AM, Matt Oliver <protogonoi at gmail.com>
> >> wrote:
> >> > On 25 October 2015 at 12:22, Ganesh Ajjanagadde <gajjanag at mit.edu>
> >> wrote:
> >> >
> >> >> On Sat, Oct 24, 2015 at 7:03 PM, James Almer <jamrial at gmail.com>
> wrote:
> >> >> > On 10/24/2015 7:48 PM, Ganesh Ajjanagadde wrote:
> >> >> >> On Sat, Oct 24, 2015 at 6:08 PM, James Almer <jamrial at gmail.com>
> >> wrote:
> >> >> >>> This gives the compiler some flexibility
> >> >> >>>
> >> >> >>> Signed-off-by: James Almer <jamrial at gmail.com>
> >> >> >>> ---
> >> >> >>> libavutil/x86/intmath.h | 2 +-
> >> >> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >> >> >>>
> >> >> >>> diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h
> >> >> >>> index 7881e3c..10d3e7f 100644
> >> >> >>> --- a/libavutil/x86/intmath.h
> >> >> >>> +++ b/libavutil/x86/intmath.h
> >> >> >>> @@ -36,7 +36,7 @@ static av_always_inline av_const int
> >> >> ff_ctzll_x86(long long v)
> >> >> >>> {
> >> >> >>> # if ARCH_X86_64
> >> >> >>> uint64_t c;
> >> >> >>> - __asm__("bsfq %1,%0" : "=r" (c) : "r" (v));
> >> >> >>> + __asm__ ("bsfq %1,%0" : "=r" (c) : "rm" (v));
> >> >> >>> return c;
> >> >> >>> # else
> >> >> >>> return ((uint32_t)v == 0) ? _bit_scan_forward((uint32_t)(v
> >>
> >> >> 32)) + 32 : _bit_scan_forward((uint32_t)v);
> >> >> >>> --
> >> >> >>> 2.6.2
> >> >> >>
> >> >> >> This question may be silly, but can you clarify when this asm
> code is
> >> >> >> used instead of __builtin_ctzll? For instance, I think on
> GNU/Linux,
> >> >> >> x86-64, sufficiently modern GCC (even with asm enabled), we should
> >> >> >> always use the builtin: the compiler knows best what to do with
> its
> >> >> >> builtin.
> >> >> >
> >> >> > This is ICC/ICL, not GCC.
> >> >>
> >> >> Ah, now I recall that _bit_scan_forward64 was not always available on
> >> >> ICC. Anyway, this is just a heads up to whoever uses ICC/ICL and the
> >> >> like: you may want to find out to which versions this built in
> >> >> applies.
> >> >
> >> >
> >> > bit_scan_forward64 isnt available on ICL at all, hence the use of asm.
> >> >
> >>
> >> Intels intrinsic guide lists _BitScanForward64 as the intrinsic to use,
> >> however.
> >>
> >>
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=Other&expand=373
> >
> >
> > _BitScanForward64 is a msvc intrinsic only available on windows
> platforms
> > so not usable with ICC. The native asm implementation also performs
> better
> > with ICL hence its use.
>
> The "Intel(R) C++ Compiler User and Reference Guide" would care to
> disagree with you.
>
Well ill be... I dont have ICC on hand to check it but ill take Intels word
for it. Its interesting that Intel decided to support that intrinsic as
opposed to just extending there own version to 64b (i.e.
_bit_scan_forward64). _BitScanForward64 as far as I know was introduced by
msvc and made available in "winnt.h" hence the different naming convention
to other intrinsics. That said the inline asm version is still preferable
with Intel as the intrinsic passes the result via pointer which when tested
with msvc11 and 12 crt's resulted in horrible performance hits as the
compiler didnt optimise out the memory access in order to keep the result
in register (works fine with newer ICL 16 and msvcrt14 though).
Using tzcnt would be the more interesting option. Changing the asm to tzcnt
works with ICL. But also the use of the tzcnt intrinsic (_tzcnt_u64) would
be possible with both intel and msvc. I dont have anything older than
Haswell or Piledriver to double check but I have seen its use in several
other projects without issue so in theory it shouldnt be a problem.
If tzcnt is preferable I can make up a patch to convert the intel and msvc
versions to use the required intrinsic.
More information about the ffmpeg-devel
mailing list