[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

Michael Niedermayer michael at niedermayer.cc
Wed Jan 6 22:09:57 EET 2021


On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote:
> Ping!

crashes (due to alignment i think)

(gdb) disassemble $rip-32,$rip+32
Dump of assembler code from 0x5555555730a1 to 0x5555555730e1:
   0x00005555555730a1 <ff_yuv2yuvX_avx2+161>:	int    $0x71
   0x00005555555730a3 <ff_yuv2yuvX_avx2+163>:	out    %al,$0x3
   0x00005555555730a5 <ff_yuv2yuvX_avx2+165>:	vpsraw $0x3,%ymm1,%ymm1
   0x00005555555730aa <ff_yuv2yuvX_avx2+170>:	vpackuswb %ymm4,%ymm3,%ymm3
   0x00005555555730ae <ff_yuv2yuvX_avx2+174>:	vpackuswb %ymm1,%ymm6,%ymm6
   0x00005555555730b2 <ff_yuv2yuvX_avx2+178>:	mov    (%rdi),%rdx
   0x00005555555730b5 <ff_yuv2yuvX_avx2+181>:	vpermq $0xd8,%ymm3,%ymm3
   0x00005555555730bb <ff_yuv2yuvX_avx2+187>:	vpermq $0xd8,%ymm6,%ymm6
=> 0x00005555555730c1 <ff_yuv2yuvX_avx2+193>:	vmovdqa %ymm3,(%rcx,%rax,1)
   0x00005555555730c6 <ff_yuv2yuvX_avx2+198>:	vmovdqa %ymm6,0x20(%rcx,%rax,1)
   0x00005555555730cc <ff_yuv2yuvX_avx2+204>:	add    $0x40,%rax
   0x00005555555730d0 <ff_yuv2yuvX_avx2+208>:	mov    %rdi,%rsi
   0x00005555555730d3 <ff_yuv2yuvX_avx2+211>:	cmp    %r8,%rax
   0x00005555555730d6 <ff_yuv2yuvX_avx2+214>:	jb     0x55555557304d <ff_yuv2yuvX_avx2+77>
   0x00005555555730dc <ff_yuv2yuvX_avx2+220>:	vzeroupper 
   0x00005555555730df <ff_yuv2yuvX_avx2+223>:	retq   
   0x00005555555730e0 <yuv2rgb_c_48+0>:	push   %r15
End of assembler dump.
(gdb) info all-registers 
rax            0x0	0
rbx            0x0	0
rcx            0x55555583f470	93824995292272


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Modern terrorism, a quick summary: Need oil, start war with country that
has oil, kill hundread thousand in war. Let country fall into chaos,
be surprised about raise of fundamantalists. Drop more bombs, kill more
people, be surprised about them taking revenge and drop even more bombs
and strip your own citizens of their rights and freedoms. to be continued
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210106/6378bfcf/attachment.sig>


More information about the ffmpeg-devel mailing list