[FFmpeg-devel] [PATCH 2/2] x86/tx_float: implement inverse MDCT AVX2 assembly

Michael Niedermayer michael at niedermayer.cc
Sat Sep 3 23:55:38 EEST 2022


On Sat, Sep 03, 2022 at 03:42:36AM +0200, Lynne wrote:
> This commit implements an iMDCT in pure assembly.
> 
> This is capable of processing any mod-8 transforms, rather than just
> power of two, but since power of two is all we have assembly for
> currently, that's what's supported.
> It would really benefit if we could somehow use the C code to decide
> which function to jump into, but exposing function labels from assebly
> into C is anything but easy.
> The post-transform loop could probably be improved.
> 
> This was somewhat annoying to write, as we must support arbitrary
> strides during runtime. There's a fast branch for stride == 4 bytes
> and a slower one which uses vgatherdps.
> 
> Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):
> 
> 128pt:
>    2811 decicycles in         av_tx (imdct),16775916 runs,   1300 skips
>    3082 decicycles in         av_imdct_half,16776751 runs,    465 skips
> 
> 256pt:
>    4920 decicycles in         av_tx (imdct),16775820 runs,   1396 skips
>    5378 decicycles in         av_imdct_half,16776411 runs,    805 skips
> 
> 512pt:
>    9668 decicycles in         av_tx (imdct),16775774 runs,   1442 skips
>   10626 decicycles in         av_imdct_half,16775647 runs,   1569 skips
> 
> 1024pt:
>   19812 decicycles in         av_tx (imdct),16777144 runs,     72 skips
>   23036 decicycles in         av_imdct_half,16777167 runs,     49 skips
> 
> Patch attached.
> 

x86-32 doesnt digest this very well

src/libavutil/x86/tx_float.asm:1540: error: (ASSERT:2) assertion ``8 <= 7'' failed
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:618: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:304: ... from macro `ASSERT' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:620: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:382: ... from macro `ALLOC_STACK' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:621: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:273: ... from macro `LOAD_IF_USED' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:621: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:273: ... from macro `LOAD_IF_USED' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:621: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:273: ... from macro `LOAD_IF_USED' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:621: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:273: ... from macro `LOAD_IF_USED' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1362: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7d' undefined
src/libavutil/x86/tx_float.asm:1365: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7d' undefined
src/libavutil/x86/tx_float.asm:1366: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r11q' undefined
src/libavutil/x86/tx_float.asm:1369: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7d' undefined
src/libavutil/x86/tx_float.asm:1380: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1396: ... from macro `movd' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1383: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m15' undefined
src/libavutil/x86/tx_float.asm:1384: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1459: ... from macro `pcmpeqd' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m15' undefined
src/libavutil/x86/tx_float.asm:1388: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m15' undefined
src/libavutil/x86/tx_float.asm:1390: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m8' undefined
src/libavutil/x86/tx_float.asm:1394: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m9' undefined
src/libavutil/x86/tx_float.asm:1395: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m10' undefined
src/libavutil/x86/tx_float.asm:1403: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1414: ... from macro `movshdup' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m11' undefined
src/libavutil/x86/tx_float.asm:1404: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1414: ... from macro `movshdup' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m12' undefined
src/libavutil/x86/tx_float.asm:1405: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1415: ... from macro `movsldup' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m13' undefined
src/libavutil/x86/tx_float.asm:1406: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1415: ... from macro `movsldup' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m10' undefined
src/libavutil/x86/tx_float.asm:1408: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1421: ... from macro `mulps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m11' undefined
src/libavutil/x86/tx_float.asm:1409: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1421: ... from macro `mulps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m10' undefined
src/libavutil/x86/tx_float.asm:1411: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1562: ... from macro `shufps' defined here
src//libavutil/x86/x86inc.asm:1262: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m11' undefined
src/libavutil/x86/tx_float.asm:1412: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1562: ... from macro `shufps' defined here
src//libavutil/x86/x86inc.asm:1262: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m10' undefined
src/libavutil/x86/tx_float.asm:1414: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1668: ... from macro `fmaddsubps' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m11' undefined
src/libavutil/x86/tx_float.asm:1415: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1668: ... from macro `fmaddsubps' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1417: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1398: ... from macro `movdqa' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src//libavutil/x86/x86inc.asm:1716: ... from macro `vmovdqa' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1418: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1398: ... from macro `movdqa' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src//libavutil/x86/x86inc.asm:1716: ... from macro `vmovdqa' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1422: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1140: ... from macro `add' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1430: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1431: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1432: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1436: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1444: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m8' undefined
src/libavutil/x86/tx_float.asm:1449: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1564: ... from macro `shufps' defined here
src//libavutil/x86/x86inc.asm:1260: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m8' undefined
src/libavutil/x86/tx_float.asm:1452: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1421: ... from macro `mulps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1461: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1462: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r9q' undefined
src/libavutil/x86/tx_float.asm:1463: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r10q' undefined
src/libavutil/x86/tx_float.asm:1464: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r9q' undefined
src/libavutil/x86/tx_float.asm:1468: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1405: ... from macro `movlps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r10q' undefined
src/libavutil/x86/tx_float.asm:1469: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1402: ... from macro `movhps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1471: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1472: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r9q' undefined
src/libavutil/x86/tx_float.asm:1473: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r10q' undefined
src/libavutil/x86/tx_float.asm:1474: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r9q' undefined
src/libavutil/x86/tx_float.asm:1478: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1405: ... from macro `movlps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r10q' undefined
src/libavutil/x86/tx_float.asm:1479: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1402: ... from macro `movhps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1484: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1152: ... from macro `sub' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1485: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1152: ... from macro `sub' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1489: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1490: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r11q' undefined
src/libavutil/x86/tx_float.asm:1492: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1118: ... from macro `call' defined here
src//libavutil/x86/x86inc.asm:1130: ... from macro `call_internal' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1495: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1499: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1500: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1505: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1506: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1507: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1508: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1530: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1531: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1533: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1140: ... from macro `add' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1534: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1152: ... from macro `sub' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1537: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:638: ... from macro `RET' defined here
src/ffbuild/common.mak:103: recipe for target 'libavutil/x86/tx_float.o' failed
make: *** [libavutil/x86/tx_float.o] Error 1
make: *** Waiting for unfinished jobs....

thx

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220903/c74b79ff/attachment.sig>


More information about the ffmpeg-devel mailing list