[FFmpeg-devel] [PATCH 2/2] x86/tx_float: implement inverse MDCT AVX2 assembly

Lynne dev at lynne.ee
Sun Sep 4 00:35:24 EEST 2022


Sep 3, 2022, 22:55 by michael at niedermayer.cc:

> On Sat, Sep 03, 2022 at 03:42:36AM +0200, Lynne wrote:
>
>> This commit implements an iMDCT in pure assembly.
>>
>> This is capable of processing any mod-8 transforms, rather than just
>> power of two, but since power of two is all we have assembly for
>> currently, that's what's supported.
>> It would really benefit if we could somehow use the C code to decide
>> which function to jump into, but exposing function labels from assebly
>> into C is anything but easy.
>> The post-transform loop could probably be improved.
>>
>> This was somewhat annoying to write, as we must support arbitrary
>> strides during runtime. There's a fast branch for stride == 4 bytes
>> and a slower one which uses vgatherdps.
>>
>> Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):
>>
>> 128pt:
>>    2811 decicycles in         av_tx (imdct),16775916 runs,   1300 skips
>>    3082 decicycles in         av_imdct_half,16776751 runs,    465 skips
>>
>> 256pt:
>>    4920 decicycles in         av_tx (imdct),16775820 runs,   1396 skips
>>    5378 decicycles in         av_imdct_half,16776411 runs,    805 skips
>>
>> 512pt:
>>    9668 decicycles in         av_tx (imdct),16775774 runs,   1442 skips
>>   10626 decicycles in         av_imdct_half,16775647 runs,   1569 skips
>>
>> 1024pt:
>>   19812 decicycles in         av_tx (imdct),16777144 runs,     72 skips
>>   23036 decicycles in         av_imdct_half,16777167 runs,     49 skips
>>
>> Patch attached.
>>
>
> x86-32 doesnt digest this very well
>

Thanks for checking, ifdef'd it out of 32bit compiles, also fixed
a small issue with asm functions being picked for non-asm calls.
Attached.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-x86-tx_float-add-support-for-calling-assembly-functi.patch
Type: text/x-diff
Size: 14023 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220903/9b5a0c75/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-x86-tx_float-implement-inverse-MDCT-AVX2-assembly.patch
Type: text/x-diff
Size: 12620 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220903/9b5a0c75/attachment-0001.patch>


More information about the ffmpeg-devel mailing list