[FFmpeg-devel] [PATCH] x86/tx_float: implement inverse MDCT AVX2 assembly
Lynne
dev at lynne.ee
Fri Sep 2 00:47:08 EEST 2022
This commit implements an iMDCT in pure assembly.
This is capable of processing any mod-8 transforms, rather than just
power of two, but since power of two is all we have assembly for
currently, that's what's supported.
It would really benefit if we could somehow use the C code to decide
which function to jump into, but exposing function labels from assebly
into C is anything but easy.
The post-transform loop could probably be improved.
This was somewhat annoying to write, as we must support arbitrary
strides during runtime. There's a fast branch for stride == 4 bytes
and a slower one which uses vgatherdps.
Benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):
128pt:
2791 decicycles in av_tx (imdct),16775675 runs, 1541 skips
3024 decicycles in av_imdct_half,16776779 runs, 437 skips
256pt:
5055 decicycles in av_tx (imdct), 2096602 runs, 550 skips
5324 decicycles in av_imdct_half, 2097046 runs, 106 skips
512pt:
9922 decicycles in av_tx (imdct), 2096983 runs, 169 skips
10390 decicycles in av_imdct_half, 2097002 runs, 150 skips
1024pt:
20482 decicycles in av_tx (imdct), 2097089 runs, 63 skips
20662 decicycles in av_imdct_half, 2097115 runs, 37 skips
Patch attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-x86-tx_float-implement-inverse-MDCT-AVX2-assembly.patch
Type: text/x-diff
Size: 11996 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220901/b265796d/attachment.patch>
More information about the ffmpeg-devel
mailing list