[FFmpeg-devel] [PATCH v2 00/15] avfilter/vf_bwdif: Add aarch64 neon functions

Martin Storsjö martin at martin.st
Mon Jul 3 00:09:52 EEST 2023


On Sun, 2 Jul 2023, John Cox wrote:

> Also adds a filter_line3 method which on aarch64 neon yields approx 30%
> speedup over 2xfilter_line and a memcpy
>
> Differences from v1:
> .align 16 corrected to .balign 16
> SXTW tolower
> Mac ABI (hopefully) fixed
> V register pop/push macroed & prettified
>
> John Cox (15):
>  avfilter/vf_bwdif: Add outline for aarch neon functions
>  avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
>  avfilter/vf_bwdif: Export C filter_intra
>  avfilter/vf_bwdif: Add neon for filter_intra
>  tests/checkasm: Add test for vf_bwdif filter_intra
>  avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon
>  avfilter/vf_bwdif: Export C filter_edge
>  avfilter/vf_bwdif: Add neon for filter_edge
>  tests/checkasm: Add test for vf_bwdif filter_edge
>  avfilter/vf_bwdif: Export C filter_line
>  avfilter/vf_bwdif: Add neon for filter_line
>  avfilter/vf_bwdif: Add a filter_line3 method for optimisation
>  avfilter/vf_bwdif: Add neon for filter_line3
>  tests/checkasm: Add test for vf_bwdif filter_line3
>  avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines

Overall, I'd suggest squashing/reordering the patches like this:

- tests/checkasm: Add test for vf_bwdif filter_intra
- avfilter/vf_bwdif: Add neon for filter_intra
   (With the preceding patches squashed. For extra common macros, only add
   the ones you use in this patch here.)
- tests/checkasm: Add test for vf_bwdif filter_edge
- avfilter/vf_bwdif: Add neon for filter_edge (with other dependencies
   squashed)
- avfilter/vf_bwdif: Add neon for filter_line
- avfilter/vf_bwdif: Add a filter_line3 method for optimisation
   + checkasm test squashed
- avfilter/vf_bwdif: Add neon for filter_line3

// Martin



More information about the ffmpeg-devel mailing list