[FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly
Lynne
dev at lynne.ee
Fri Sep 2 08:55:30 EEST 2022
Sep 2, 2022, 07:49 by dev at lynne.ee:
> Version 2 notes: halved the amount of loads and loops for the
> pre-transform loop by exploiting the symmetry.
>
> This commit implements an iMDCT in pure assembly.
>
> This is capable of processing any mod-8 transforms, rather than just
> power of two, but since power of two is all we have assembly for
> currently, that's what's supported.
> It would really benefit if we could somehow use the C code to decide
> which function to jump into, but exposing function labels from assebly
> into C is anything but easy.
> The post-transform loop could probably be improved.
>
> This was somewhat annoying to write, as we must support arbitrary
> strides during runtime. There's a fast branch for stride == 4 bytes
> and a slower one which uses vgatherdps.
>
> Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):
>
> 128pt:
> 2815 decicycles in av_tx (imdct),16776766 runs, 450 skips
> 3097 decicycles in av_imdct_half,16776745 runs, 471 skips
>
> 256pt:
> 4931 decicycles in av_tx (imdct), 4193127 runs, 1177 skips
> 5401 decicycles in av_imdct_half, 2097058 runs, 94 skips
>
> 512pt:
> 9764 decicycles in av_tx (imdct), 4193929 runs, 375 skips
> 10690 decicycles in av_imdct_half, 2096948 runs, 204 skips
>
> 1024pt:
> 20113 decicycles in av_tx (imdct), 4194202 runs, 102 skips
> 21258 decicycles in av_imdct_half, 2097147 runs, 5 skips
>
> Patch attached.
>
Forgot to git add some minor reordering/fma changes.
W/e.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v2-0001-x86-tx_float-implement-inverse-MDCT-AVX2-assembly.patch
Type: text/x-diff
Size: 12781 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220902/16b3811f/attachment.patch>
More information about the ffmpeg-devel
mailing list