[FFmpeg-devel] TEXTRELs in fft_mmx.asm (was Re: [PATCH] split-radix FFT)
Dominik 'Rathann' Mierzejewski
dominik
Sat Nov 1 22:21:01 CET 2008
Hi.
Sorry for dragging out an old thread, but...
[...]
> From 548a4d20ec39d14829f46ae2ee0325908d095097 Mon Sep 17 00:00:00 2001
> From: Loren Merritt <pengvado at akuvian.org>
> Date: Wed, 23 Jul 2008 22:55:09 -0600
> Subject: [PATCH] split-radix FFT
> c is 1.9x faster than previous c (on various x86 cpus), sse is 1.6x faster than previous sse.
>
> ---
> libavcodec/Makefile | 5 +
> libavcodec/dsputil.h | 9 +-
> libavcodec/fft.c | 371 ++++++++++++++++++++++------------
> libavcodec/i386/fft_3dn.c | 111 +----------
> libavcodec/i386/fft_3dn2.c | 110 ++---------
> libavcodec/i386/fft_mmx.asm | 467 +++++++++++++++++++++++++++++++++++++++++++
> libavcodec/i386/fft_sse.c | 149 ++++----------
> 7 files changed, 783 insertions(+), 439 deletions(-)
> create mode 100644 libavcodec/i386/fft_mmx.asm
>
[...]
> diff --git a/libavcodec/i386/fft_mmx.asm b/libavcodec/i386/fft_mmx.asm
> new file mode 100644
> index 0000000..c0a9bd5
> --- /dev/null
> +++ b/libavcodec/i386/fft_mmx.asm
[...]
> +%macro DECL_FFT 2-3 ; nbits, cpu, suffix
> +%xdefine list_of_fft fft4%2, fft8%2
> +%if %1==5
> +%xdefine list_of_fft list_of_fft, fft16%2
> +%endif
> +
> +%assign n 1<<%1
> +%rep 17-%1
> +%assign n2 n/2
> +%assign n4 n/4
> +%xdefine list_of_fft list_of_fft, fft %+ n %+ %3%2
> +
> +align 16
> +fft %+ n %+ %3%2:
> + call fft %+ n2 %+ %2
> + add r0, n*4 - (n&(-2<<%1))
> + call fft %+ n4 %+ %2
> + add r0, n*2 - (n2&(-2<<%1))
> + call fft %+ n4 %+ %2
> + sub r0, n*6 + (n2&(-2<<%1))
> + lea r1, [ff_cos_ %+ n GLOBAL]
> + mov r2d, n4/2
> + jmp pass%3%2
> +
> +%assign n n*2
> +%endrep
> +%undef n
> +
> +align 8
> +dispatch_tab%3%2: pointer list_of_fft
> +
> +; On x86_32, this function does the register saving and restoring for all of fft.
> +; The others pass args in registers and don't spill anything.
> +cglobal ff_fft_dispatch%3%2, 2,5,0, z, nbits
> + lea r2, [dispatch_tab%3%2 GLOBAL]
> + mov r2, [r2 + (nbitsq-2)*gprsize]
> + call r2
> + RET
> +%endmacro ; DECL_FFT
> +
> +DECL_FFT 5, _sse
> +DECL_FFT 5, _sse, _interleave
> +DECL_FFT 4, _3dn
> +DECL_FFT 4, _3dn, _interleave
> +DECL_FFT 4, _3dn2
> +DECL_FFT 4, _3dn2, _interleave
... these 6 macros seem to be causing textrels even on x86_64.
I've already given up on avoiding textrels in FFmpeg on x86_32,
but on x86_64 this is the only problematic case.
Here's how I found them:
$ ./configure --enable-shared --disable-static --enable-gpl --enable-swscale --enable-postproc --enable-avfilter --enable-avfilter-lavf --enable-pthreads
$ make
...
yasm -f elf -DARCH_X86_64 -m amd64 -DPIC -g dwarf2 -I i386/ -o i386/fft_mmx.o i386/fft_mmx.asm
...
Note that fft_mmx.asm is compiled into PIC.
$ eu-readelf -l libavcodec.so.52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x44966c 0x44966c R E 0x200000
LOAD 0x449670 0x0000000000649670 0x0000000000649670 0x014c78 0x2b8530 RW 0x200000
DYNAMIC 0x4510e0 0x00000000006510e0 0x00000000006510e0 0x0001f0 0x0001f0 RW 0x8
NOTE 0x000190 0x0000000000000190 0x0000000000000190 0x000024 0x000024 R 0x4
GNU_EH_FRAME 0x4295c0 0x00000000004295c0 0x00000000004295c0 0x005ee4 0x005ee4 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x8
Section to Segment mapping:
Segment Sections...
00 [RO: .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame]
01 .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
02 .dynamic
03 [RO: .note.gnu.build-id]
04 [RO: .eh_frame_hdr]
05
(note the size of the first (text) section)
$ eu-readelf -r libavcodec.so.52
Relocation section [ 7] '.rela.dyn' for section [ 0] '' at offset 0x10048 contains 3839 entries:
Offset Type Value Addend Name
0x000000000036b948 X86_64_RELATIVE 000000000000000000 +3582944
0x000000000036b950 X86_64_RELATIVE 000000000000000000 +3583024
0x000000000036b958 X86_64_RELATIVE 000000000000000000 +3583200
...
0x000000000036cba8 X86_64_RELATIVE 000000000000000000 +3590800
0x000000000036cbb0 X86_64_RELATIVE 000000000000000000 +3590864
0x000000000036cbb8 X86_64_RELATIVE 000000000000000000 +3590928
0x00000000006496a0 X86_64_RELATIVE 000000000000000000 +4358183
0x00000000006496b0 X86_64_RELATIVE 000000000000000000 +3716537
0x00000000006496c0 X86_64_RELATIVE 000000000000000000 +3716541
...
Every address that is in the range of a segment which is loaded without
write permission indicates a text relocation[1]. Note that all the
relocations at the beginning fall within the first section, which has
only Read and Execute permissions.
Let's find where they come from:
$ for addr in `eu-readelf -r libavcodec.so.52 | grep 0x000000000036 | awk '{print $1;}'` ; do eu-addr2line -f -S -e libavcodec.so.52 $addr ; done | grep asm | sort -u
i386/fft_mmx.asm:461
i386/fft_mmx.asm:462
i386/fft_mmx.asm:463
i386/fft_mmx.asm:464
i386/fft_mmx.asm:465
i386/fft_mmx.asm:466
Loren (or anyone else familiar with the code): is it possible to avoid them?
PS. Same can be done with the tools from binutils (without the eu- prefix).
[1] http://people.redhat.com/drepper/textrelocs.html
--
MPlayer http://mplayerhq.hu | Livna http://rpm.livna.org
There should be a science of discontent. People need hard times and
oppression to develop psychic muscles.
-- from "Collected Sayings of Muad'Dib" by the Princess Irulan
More information about the ffmpeg-devel
mailing list