[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.
Michael Niedermayer
michael at niedermayer.cc
Wed Jan 6 22:09:57 EET 2021
On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote:
> Ping!
crashes (due to alignment i think)
(gdb) disassemble $rip-32,$rip+32
Dump of assembler code from 0x5555555730a1 to 0x5555555730e1:
0x00005555555730a1 <ff_yuv2yuvX_avx2+161>: int $0x71
0x00005555555730a3 <ff_yuv2yuvX_avx2+163>: out %al,$0x3
0x00005555555730a5 <ff_yuv2yuvX_avx2+165>: vpsraw $0x3,%ymm1,%ymm1
0x00005555555730aa <ff_yuv2yuvX_avx2+170>: vpackuswb %ymm4,%ymm3,%ymm3
0x00005555555730ae <ff_yuv2yuvX_avx2+174>: vpackuswb %ymm1,%ymm6,%ymm6
0x00005555555730b2 <ff_yuv2yuvX_avx2+178>: mov (%rdi),%rdx
0x00005555555730b5 <ff_yuv2yuvX_avx2+181>: vpermq $0xd8,%ymm3,%ymm3
0x00005555555730bb <ff_yuv2yuvX_avx2+187>: vpermq $0xd8,%ymm6,%ymm6
=> 0x00005555555730c1 <ff_yuv2yuvX_avx2+193>: vmovdqa %ymm3,(%rcx,%rax,1)
0x00005555555730c6 <ff_yuv2yuvX_avx2+198>: vmovdqa %ymm6,0x20(%rcx,%rax,1)
0x00005555555730cc <ff_yuv2yuvX_avx2+204>: add $0x40,%rax
0x00005555555730d0 <ff_yuv2yuvX_avx2+208>: mov %rdi,%rsi
0x00005555555730d3 <ff_yuv2yuvX_avx2+211>: cmp %r8,%rax
0x00005555555730d6 <ff_yuv2yuvX_avx2+214>: jb 0x55555557304d <ff_yuv2yuvX_avx2+77>
0x00005555555730dc <ff_yuv2yuvX_avx2+220>: vzeroupper
0x00005555555730df <ff_yuv2yuvX_avx2+223>: retq
0x00005555555730e0 <yuv2rgb_c_48+0>: push %r15
End of assembler dump.
(gdb) info all-registers
rax 0x0 0
rbx 0x0 0
rcx 0x55555583f470 93824995292272
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Modern terrorism, a quick summary: Need oil, start war with country that
has oil, kill hundread thousand in war. Let country fall into chaos,
be surprised about raise of fundamantalists. Drop more bombs, kill more
people, be surprised about them taking revenge and drop even more bombs
and strip your own citizens of their rights and freedoms. to be continued
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210106/6378bfcf/attachment.sig>
More information about the ffmpeg-devel
mailing list