[FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)
Martin Vignali
martin.vignali at gmail.com
Tue Jan 16 23:26:50 EET 2018
Hello,
following Henrik Gramner comments (in discussion "avfilter/x86/vf_blend :
add avx2 version for 8b func (WIP)")
in attach new patch to add AVX2 version for each 8b func (except divide)
001 : avutil : add ABS2 for avx2
002 : avfilter : add AVX2 version
for most of the func, the AVX2 is a simple modification
VBROADCASTi128, for constant loading
when the process stay in 8bits
when the process use intermediate 16 bits
i add two macro
for the load part
PMOVZXBW : load mmsize/2 bits and expand to 16
(the sse4 version seems to be most of the time slower than the SSE2
"emulation")
like the avx2 doesn't need zero fill vector register
i add a if/else, at the start of each blend macro, and change the index of
the vector registers
%macro GRAINEXTRACT 0
%if cpuflag(avx2)
BLEND_INIT grainextract, 3
%else ; SSE2
BLEND_INIT grainextract, 4
pxor m3, m3
%endif
for the store part i add PACKUSWB_AND_STORE macro
simplify code of each blend macro
pass fate test for me
Checkasm result (x86_64, kaby lake)
./tests/checkasm/checkasm --test=vf_blend --bench
benchmarking with native FFmpeg timers
nop: 35.7
checkasm: using random seed 3558581064
SSE2:
- vf_blend.8bit [OK]
SSSE3:
- vf_blend.8bit [OK]
AVX2:
- vf_blend.8bit [OK]
checkasm: all 37 tests passed
addition_c: 20523.3
addition_sse2: 441.8
addition_avx2: 383.3
and_c: 14490.3
and_sse2: 485.8
and_avx2: 205.8
average_c: 15600.5
average_sse2: 1206.0
average_avx2: 773.0
darken_c: 27218.0
darken_sse2: 397.3
darken_avx2: 194.3
difference_c: 20607.8
difference_sse2: 980.8
difference_ssse3: 968.0
difference_avx2: 487.0
extremity_c: 17286.0
extremity_sse2: 1174.0
extremity_ssse3: 981.8
extremity_avx2: 550.0
grainextract_c: 22145.3
grainextract_sse2: 1158.5
grainextract_avx2: 771.5
grainmerge_c: 24505.5
grainmerge_sse2: 1158.8
grainmerge_avx2: 774.5
hardmix_c: 16505.5
hardmix_sse2: 490.8
hardmix_avx2: 388.8
lighten_c: 27153.0
lighten_sse2: 485.0
lighten_avx2: 251.3
multiply_c: 16459.8
multiply_sse2: 1382.5
multiply_avx2: 844.0
negation_c: 32143.8
negation_sse2: 1369.0
negation_ssse3: 1175.3
negation_avx2: 522.5
or_c: 13359.5
or_sse2: 397.3
or_avx2: 195.8
phoenix_c: 31159.8
phoenix_sse2: 551.0
phoenix_avx2: 310.5
screen_c: 25372.3
screen_sse2: 1804.0
screen_avx2: 1069.0
subtract_c: 16782.5
subtract_sse2: 478.8
subtract_avx2: 236.5
xor_c: 15374.8
xor_sse2: 491.3
xor_avx2: 237.0
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-avutil-x86-x86util-add-ABS2-for-AVX2.patch
Type: application/octet-stream
Size: 683 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20180116/f2ba5392/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-avfilter-x86-vf_blend-add-AVX2-version-for-each-func.patch
Type: application/octet-stream
Size: 11380 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20180116/f2ba5392/attachment-0001.obj>
More information about the ffmpeg-devel
mailing list