[FFmpeg-devel] [PATCH] Higher bit-depth x86 SIMD assembly for yadif
James Darnley
james.darnley at gmail.com
Thu Jan 19 20:55:58 CET 2012
Attached are five patches which add code for:
mmx to sse4 instruction sets for 15 and 16 bits per sample
mmx to ssse3 instruction sets for 9 to 14 bits per sample
actual support of 9 bits per sample
I know that 11 to 15 bits per sample don't exist at present but support
might be added since h264 allows up to 14 bits per sample. Anyway, all
the code added here is used for existing features.
Below, I have copied the commit messages for convenience.
Something else to think about. The source code clarity could be greatly
improved by using yasm and its preprocessor. I wonder how much
abstraction it would need to roll the source to all three functions
together and whether it would save source code size.
Subject: [PATCH 1/5] x86 SIMD for 16 bits per sample in yadif
It might be a rather dumb copy of the 8-bit SIMD but it works and
produces identical output to the C. The MMX and SSE2 has been tested on
my Athlon64. The SSSE3 and SSE4.1 needs testing and benching elsewhere.
Benchmarks on the Athlon64 using a 704px wide video, per line:
1693075 decicycles in C, 521977 runs, 2311 skips
1029468 decicycles in mmx, 523347 runs, 941 skips
730504 decicycles in sse2, 523474 runs, 814 skips
Subject: [PATCH 2/5] x86 SIMD for 9 to 14 bits per sample in yadif
These lower bit depths do not need unpacking to double words letting the
code process more pixels per iteration (still 2 in mmx but 6 in sse2)
and avoiding emulating the missing double word instructions on older
instruction sets.
Benchmarks on my Athlon64 using a 704 pixel wide video, per line:
1695927 decicycles in C, 260986 runs, 1158 skips
854770 decicycles in mmx, 261717 runs, 427 skips
440202 decicycles in sse2, 261829 runs, 315 skips
Works out at:
mmx - 1.20 times faster than the 16 bit
sse2 - 1.66 times faster than the 16 bit
Subject: [PATCH 3/5] cosmetic indent
Subject: [PATCH 4/5] Actually support 9-bit YUV in yadif
Subject: [PATCH 5/5] Update copyright headers in yadif related files
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-x86-SIMD-for-16-bits-per-sample-in-yadif.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0002-x86-SIMD-for-9-to-14-bits-per-sample-in-yadif.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0003-cosmetic-indent.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0004-Actually-support-9-bit-YUV-in-yadif.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment-0003.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0005-Update-copyright-headers-in-yadif-related-files.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment-0004.ksh>
More information about the ffmpeg-devel
mailing list