[FFmpeg-devel] [PATCH 5/5] avfilter/vf_yadif: Add x86_64 avx yadif asm
Chris Phlipot
cphlipot0 at gmail.com
Thu Jul 21 05:30:46 EEST 2022
Thanks for calling that out. It looks like I was cross-compiling for 32-bit
incorrectly from my 64-bit host. I've reproduced the failure and submitted
a v2 with the fix. If you're still seeing build failures even after v2, can
you also provide more details on how you are building so I can reproduce
and fix?
- Chris
On Wed, Jul 20, 2022 at 6:17 AM Michael Niedermayer <michael at niedermayer.cc>
wrote:
> On Tue, Jul 19, 2022 at 09:41:17PM -0700, Chris Phlipot wrote:
> > Add a new version of yadif_filter_line performed using packed bytes
> > instead of the packed words used by the current implementaiton. As
> > a result this implementation runs almost 2x as fast as the current
> > fastest SSSE3 implementation.
> >
> > This implementation is created from scratch based on the C code, with
> > the goal of keeping all intermediate values within 8-bits so that
> > the vectorized code can be computed using packed bytes. differences
> > are as follows:
> > - Use algorithms to compute avg and abs difference using only 8-bit
> > intermediate values.
> > - Reworked the mode 1 code by applying various mathematical identities
> > to keep all intermediate values within 8-bits.
> > - Attempt to compute the spatial score using only 8-bits. The actual
> > spatial score fits within this range 97% (content dependent) of the
> > time for the entire 128-bit xmm vector. In the case that spatial
> > score needs more than 8-bits to be represented, we detect this case,
> > and recompute the spatial score using 16-bit packed words instead.
> >
> > In 3% of cases the spatial_score will need more than 8-bytes to store
> > so we have a slow path, where the spatial score is computed using
> > packed words instead.
> >
> > This implementation is currently limited to x86_64 due to the number
> > of registers required. x86_32 is possible, but the performance benefit
> > over the existing SSSE3 implentation is not as great, due to all of the
> > stack spills that would result from having far fewer registers. ASM was
> > not generated for the 32-bit varient due to limited ROI, as most AVX
> > users are likely on 64-bit OS at this point and 32-bit users would
> > lose out on most of the performance benefit.
> >
> > Signed-off-by: Chris Phlipot <cphlipot0 at gmail.com>
>
> theres no need to support 32it but ffmpeg build must not break
> on linux x86-32
>
> src/libavfilter/x86/vf_yadif_x64.asm:145: error: impossible combination of
> address sizes
> src/libavfilter/x86/vf_yadif_x64.asm:145: error: invalid effective address
> src/libavfilter/x86/vf_yadif_x64.asm:146: error: impossible combination of
> address sizes
> src//libavutil/x86/x86inc.asm:1399: ... from macro `movdqu' defined here
> src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined
> here
> src//libavutil/x86/x86inc.asm:1717: ... from macro `vmovdqu' defined here
>
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Everything should be made as simple as possible, but not simpler.
> -- Albert Einstein
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
More information about the ffmpeg-devel
mailing list