[FFmpeg-devel] [PATCH] x86/tx_float: implement inverse MDCT AVX2 assembly

Fri Sep 2 00:47:08 EEST 2022

This commit implements an iMDCT in pure assembly.

This is capable of processing any mod-8 transforms, rather than just
power of two, but since power of two is all we have assembly for
currently, that's what's supported.
It would really benefit if we could somehow use the C code to decide
which function to jump into, but exposing function labels from assebly
into C is anything but easy.
The post-transform loop could probably be improved.

This was somewhat annoying to write, as we must support arbitrary
strides during runtime. There's a fast branch for stride == 4 bytes
and a slower one which uses vgatherdps.

Benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):

128pt:
   2791 decicycles in         av_tx (imdct),16775675 runs,   1541 skips
   3024 decicycles in         av_imdct_half,16776779 runs,    437 skips

256pt:
   5055 decicycles in         av_tx (imdct), 2096602 runs,    550 skips
   5324 decicycles in         av_imdct_half, 2097046 runs,    106 skips

512pt:
   9922 decicycles in         av_tx (imdct), 2096983 runs,    169 skips
  10390 decicycles in         av_imdct_half, 2097002 runs,    150 skips

1024pt:
  20482 decicycles in         av_tx (imdct), 2097089 runs,     63 skips
  20662 decicycles in         av_imdct_half, 2097115 runs,     37 skips

Patch attached.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-x86-tx_float-implement-inverse-MDCT-AVX2-assembly.patch
Type: text/x-diff
Size: 11996 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220901/b265796d/attachment.patch>