[FFmpeg-devel] [PATCH] avcodec/cfhd: add x86 SIMD
Moritz Barsnick
barsnick at gmx.net
Thu Aug 20 13:53:05 EEST 2020
On Sun, Aug 16, 2020 at 18:25:12 +0200, Paul B Mahol wrote:
> On 8/16/20, Paul B Mahol <onemda at gmail.com> wrote:
> > Please help porting this to linux and 64bit calling convention.
>
> New patch attached.
>
> This one does not allocate stack on x32.
I wanted to benchmark on several machines (newest I have is a Haswell,
I also have an "Intel(R) Atom(TM) CPU D525 @ 1.80GHz" x86_64, and the
below is a Pentium 4 x86), but got stuck on the ancient x86.
Firstly, superficial benchmark result on the Pentium 4:
$ time ffmpeg -i bigger_res.mov -map 0:v -f null -
Without patchset: speed=0.0331x (plus/minus a bit)
With patchset: speed=0.0577x (plus/minus a bit)
I'll add benchmarks with my other systems, if desired.
Alas, with the patchset, the following command quickly terminates with
Illegal instruction in ff_cfhd_horiz_filter_clip10_sse2 ():
$ ffmpeg -i MT_BeartoothHighway_1min_Cineform.avi -map 0:v -f null -
(and obviously doesn't terminate with "-cpuflags 0", or without the
patchset).
See assembler dump below.
Compilier: icc (ICC) 14.0.3 20140422
Assembler: nasm-2.13.02
Assembly dump from gdb:
Dump of assembler code from 0x919572f to 0x919576f:
0x0919572f <ff_cfhd_horiz_filter_clip10_sse2+47>: movl $0xbf0f03ff,(%ecx,%eax,8)
0x09195736 <ff_cfhd_horiz_filter_clip10_sse2+54>: xor (%ecx),%al
0x09195738 <ff_cfhd_horiz_filter_clip10_sse2+56>: not %ecx
0x0919573a <ff_cfhd_horiz_filter_clip10_sse2+58>: jmp *0xf(%esi)
0x0919573d <ff_cfhd_horiz_filter_clip10_sse2+61>: outsb %ds:(%esi),(%dx)
0x0919573e <ff_cfhd_horiz_filter_clip10_sse2+62>: (bad)
0x0919573f <ff_cfhd_horiz_filter_clip10_sse2+63>: pmaxsw 0x99e6da0,%xmm0
0x09195747 <ff_cfhd_horiz_filter_clip10_sse2+71>: pminsw 0x99e6db0,%xmm0
=> 0x0919574f <ff_cfhd_horiz_filter_clip10_sse2+79>: pextrw $0x0,%xmm0,(%eax)
0x09195755 <ff_cfhd_horiz_filter_clip10_sse2+85>: movswl (%ecx),%esi
0x09195758 <ff_cfhd_horiz_filter_clip10_sse2+88>: imul $0x5,%esi,%esi
0x0919575b <ff_cfhd_horiz_filter_clip10_sse2+91>: movswl 0x2(%ecx),%edi
0x0919575f <ff_cfhd_horiz_filter_clip10_sse2+95>: imul $0x4,%edi,%edi
0x09195762 <ff_cfhd_horiz_filter_clip10_sse2+98>: add %esi,%edi
0x09195764 <ff_cfhd_horiz_filter_clip10_sse2+100>: movswl 0x4(%ecx),%esi
0x09195768 <ff_cfhd_horiz_filter_clip10_sse2+104>: sub %esi,%edi
0x0919576a <ff_cfhd_horiz_filter_clip10_sse2+106>: add $0x4,%edi
0x0919576d <ff_cfhd_horiz_filter_clip10_sse2+109>: sar $0x3,%edi
End of assembler dump.
CPU info:
barsnick at sunshine:~ > hwinfo --cpu
01: None 00.0: 10103 CPU
[Created at cpu.457]
Unique ID: rdCR.j8NaKXDZtZ6
Hardware Class: cpu
Arch: Intel
Vendor: "GenuineIntel"
Model: 15.2.9 "Intel(R) Pentium(R) 4 CPU 2.80GHz"
Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,pebs,bts,cid,xtpr
Clock: 2800 MHz
BogoMips: 5597.27
Cache: 512 kb
Units/Processor: 2
Config Status: cfg=new, avail=yes, need=no, active=unknown
02: None 01.0: 10103 CPU
[Created at cpu.457]
Unique ID: wkFv.j8NaKXDZtZ6
Hardware Class: cpu
Arch: Intel
Vendor: "GenuineIntel"
Model: 15.2.9 "Intel(R) Pentium(R) 4 CPU 2.80GHz"
Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,pebs,bts,cid,xtpr
Clock: 2800 MHz
BogoMips: 27198.67
Cache: 512 kb
Units/Processor: 2
Config Status: cfg=new, avail=yes, need=no, active=unknown
Cheers,
Moritz
More information about the ffmpeg-devel
mailing list