[FFmpeg-cvslog] x86/tx_float: save a branch during coefficient deinterleaving
Lynne
git at videolan.org
Tue Aug 9 04:37:10 EEST 2022
ffmpeg | branch: master | Lynne <dev at lynne.ee> | Tue Aug 9 03:31:11 2022 +0200| [98b32ef462ba344b99034f7f85c2d66cfd7f0055] | committer: Lynne
x86/tx_float: save a branch during coefficient deinterleaving
Directly branch into the special 64-point deinterleave
subroutine rather than going through the general deinterleave.
64-point transform timings on Zen 3:
Before:
1974 decicycles in av_tx (fft),16776864 runs, 352 skips
After:
1956 decicycles in av_tx (fft),16775378 runs, 1838 skips
> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=98b32ef462ba344b99034f7f85c2d66cfd7f0055
---
libavutil/x86/tx_float.asm | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/libavutil/x86/tx_float.asm b/libavutil/x86/tx_float.asm
index 21f99d3945..191af7d68f 100644
--- a/libavutil/x86/tx_float.asm
+++ b/libavutil/x86/tx_float.asm
@@ -1044,7 +1044,7 @@ ALIGN 16
add lutq, (mmsize/2)*8
%endif
cmp tgtq, 64
- je .deinterleave
+ je .64pt_deint
SPLIT_RADIX_COMBINE_64
@@ -1190,9 +1190,6 @@ FFT_SPLIT_RADIX_DEF 131072
; Final synthesis + deinterleaving code
;===============================================================================
.deinterleave:
- cmp lenq, 64
- je .64pt_deint
-
imul tmpq, lenq, 2
lea lutq, [4*lenq + tmpq]
More information about the ffmpeg-cvslog
mailing list