[FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)
Martin Vignali
martin.vignali at gmail.com
Sat Dec 9 20:11:52 EET 2017
Hello,
in attach patch to add AVX2 version for each 8b func (except divide)
001 : avutil : add ABS2 for avx2
002 : avfilter : add AVX2 version
for most of the func, the AVX2 is a simple modification
VBROADCASTi128, for constant loading
when the process stay in 8bits
when the process use intermediate 16 bits (the load use movh (64 bits load))
i create a macro (someone will probably have a better idea for the name of
these new macro)
the idea in AVX2 is to load 128bits of data (2x 64 bits)
then shuffle accross lane, the two 64 bits in the low part of each lane, to
keep the rest of the process similar
to the sse version
for the store, the idea is similar in the opposite way (shuffle before
store)
The speed improvment is not very significative for these func
(grainextract, multiply, screen, average, grainmerge) (i'm not sure, the
avx2 version is need (except for screen).
Checkasm result (x86_64, kaby lake)
./tests/checkasm/checkasm --test=vf_blend --bench
benchmarking with native FFmpeg timers
nop: 36.2
checkasm: using random seed 2027036350
SSE2:
- vf_blend.8bit [OK]
SSSE3:
- vf_blend.8bit [OK]
AVX2:
- vf_blend.8bit [OK]
checkasm: all 37 tests passed
addition_c: 21882.7
addition_sse2: 483.9
addition_avx2: 250.9
and_c: 15336.7
and_sse2: 421.9
and_avx2: 196.7
average_c: 15640.7
average_sse2: 1160.7
average_avx2: 1155.7
darken_c: 27204.7
darken_sse2: 486.7
darken_avx2: 251.9
difference_c: 17101.9
difference_sse2: 981.2
difference_ssse3: 965.4
difference_avx2: 514.2
extremity_c: 27748.9
extremity_sse2: 1174.4
extremity_ssse3: 983.7
extremity_avx2: 520.4
grainextract_c: 22755.9
grainextract_sse2: 1158.2
grainextract_avx2: 1152.9
grainmerge_c: 26173.9
grainmerge_sse2: 1156.9
grainmerge_avx2: 1153.9
hardmix_c: 15676.9
hardmix_sse2: 458.4
hardmix_avx2: 268.7
lighten_c: 27137.4
lighten_sse2: 422.2
lighten_avx2: 194.2
multiply_c: 16449.9
multiply_sse2: 1378.9
multiply_avx2: 1158.7
negation_c: 17372.9
negation_sse2: 1439.4
negation_ssse3: 1172.4
negation_avx2: 520.4
or_c: 14116.2
or_sse2: 483.9
or_avx2: 236.4
phoenix_c: 30905.9
phoenix_sse2: 553.7
phoenix_avx2: 388.7
screen_c: 20414.7
screen_sse2: 1803.9
screen_avx2: 1257.4
subtract_c: 20596.2
subtract_sse2: 439.7
subtract_avx2: 403.7
xor_c: 15380.7
xor_sse2: 445.7
xor_avx2: 405.2
Comment welcome
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-avutil-x86-x86util-add-ABS2-for-AVX2.patch
Type: application/octet-stream
Size: 682 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171209/5fa9c29c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-avfilter-x86-vf_blend-add-AVX2-version.patch
Type: application/octet-stream
Size: 10896 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171209/5fa9c29c/attachment-0001.obj>
More information about the ffmpeg-devel
mailing list